为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

linux企业集群_bin

2011-04-13 13页 doc 305KB 18阅读

用户头像

is_489345

暂无简介

举报
linux企业集群_bin本科生课程设计 课程名称 Linux/集群计算实践 课程编号 学号 学生姓名 所在专业 所在班级 指导教师 成绩 教师签字 目录 TOC \o "1-3" \h \z \u 1、计算机集群介绍 3 1.1、什么是集群 3 1.2 为什么需要集群 3 2、高可用集群 4 2.1 什么是高可用性 4 2.2 高可用集群 4 3、高性能计算集群 4 3.1 什么是高性能计算集群 4 3.2 高性能计算分类 4 3.2.1 高吞吐计算(High-throughput Computing) 4 3.2.2 分布计算(Distribute...
linux企业集群_bin
本科生课程设计 课程名称 Linux/集群计算实践 课程编号 学号 学生姓名 所在专业 所在班级 指导教师 成绩 教师签字 目录 TOC \o "1-3" \h \z \u 1、计算机集群介绍 3 1.1、什么是集群 3 1.2 为什么需要集群 3 2、高可用集群 4 2.1 什么是高可用性 4 2.2 高可用集群 4 3、高性能计算集群 4 3.1 什么是高性能计算集群 4 3.2 高性能计算分类 4 3.2.1 高吞吐计算(High-throughput Computing) 4 3.2.2 分布计算(Distributed Computing) 5 3.3 Linux高性能集群系统 5 4、《Linux企业集群》一书第13章的翻译 5 1、计算机集群介绍 1.1、什么是集群 简单的说,集群(cluster)就是一组计算机,它们作为一个整体向用户提供一组网络资源。这些单个的计算机系统就是集群的节点(node)。一个理想的集群是,用户从来不会意识到集群系统底层的节点,在他/她们看来,集群是一个系统,而非多个计算机系统。并且集群系统的管理员可以随意增加和删改集群系统的节点。 1.2 为什么需要集群 集群并不是一个全新的概念,其实早在七十年代计算机厂商和研究机构就开始了对集群系统的研究和开发。由于主要用于科学工程计算,所以这些系统并不为大家所熟知。直到Linux集群的出现,集群的概念才得以广为传播。 对集群的研究起源于集群系统的良好的性能可扩展性(scalability)。提高CPU主频和总线带宽是最初提供计算机性能的主要手段。但是这一手段对系统性能的提供是有限的。接着人们通过增加CPU个数和内存容量来提高性能,于是出现了向量机,对称多处理机(SMP)等。但是当CPU的个数超过某一阈值,象SMP这些多处理机系统的可扩展性就变的极差。主要瓶颈在于CPU访问内存的带宽并不能随着CPU个数的增加而有效增长。与SMP相反,集群系统的性能随着CPU个数的增加几乎是线性变化的。图1显示了这中情况。 集群系统的优点并不仅在于此。下面列举了集群系统的主要优点: ●高可用性:集群中的一个节点失效,它的任务可以传递给其他节点。可以有效防止单点失效。 ●高性能:负载平衡集群允许系统同时接入更多的用户。 ●高性价比:可以采用廉价的符合工业的硬件构造高性能的系统。 1.2.1 集群系统的分类 虽然根据集群系统的不同特征可以有多种分类,但是一般我们把集群系统分为两类:高可用(High Availability)集群,简称HA集群。这类集群致力于提供高度可靠的服务。 高性能计算(High Perfermance Computing)集群,简称HPC集群。这类集群致力于提供单个计算机所不能提供的强大的计算能力。 2、高可用集群 2.1 什么是高可用性 计算机系统的可用性(availability)是通过系统的可靠性(reliability)和可维护性(maintainability)来度量的。工程上通常用平均无故障时间(MTTF)来度量系统的可靠性,用平均维修时间(MTTR)来度量系统的可维护性。于是可用性被定义为: MTTF/(MTTF+MTTR)*100% 2.2 高可用集群 高可用集群就是采用集群技术来实现计算机系统的高可用性。高可用集群通常有两种工作方式: 容错系统:通常是主从服务器方式。从服务器检测主服务器的状态,当主服务工作正常时,从服务器并不提供服务。但是一旦主服务器失效,从服务器就开始代替主服务器向客户提供服务。 负载均衡系统:集群中所有的节点都处于活动状态,它们分摊系统的工作负载。一般Web服务器集群、数据库集群和应用服务器集群都属于这种类型。 关于高可用集群的讨论很多,这里就不进行深入的阐述了。 3、高性能计算集群 3.1 什么是高性能计算集群 简单的说,高性能计算(High-Performance Computing)是计算机科学的一个分支,它致力于开发超级计算机,研究并行算法和开发相关软件。高性能计算主要研究如下两类问题: 大规模科学问题,象天气预报、地形分析和生物制药等;存储和处理海量数据,象数据挖掘、图象处理和基因测序;顾名思义,高性能集群就是采用集群技术来研究高性能计算。 3.2 高性能计算分类 高性能计算的分类方法很多。这里从并行任务间的关系角度来对高性能计算分类。 3.2.1 高吞吐计算(High-throughput Computing) 有一类高性能计算,可以把它分成若干可以并行的子任务,而且各个子任务彼此间没有什么关联。象在家搜寻外星人( SETI@HOME -- Search for Extraterrestrial Intelligence at Home )就是这一类型应用。这一项目是利用Internet上的闲置的计算资源来搜寻外星人。SETI项目的服务器将一组数据和数据模式发给Internet上参加SETI的计算节点,计算节点在给定的数据上用给定的模式进行搜索,然后将搜索的结果发给服务器。服务器负责将从各个计算节点返回的数据汇集成完整的数据。因为这种类型应用的一个共同特征是在海量数据上搜索某些模式,所以把这类计算称为高吞吐计算。所谓的Internet计算都属于这一类。按照Flynn的分类,高吞吐计算属于SIMD(Single Instruction/Multiple Data)的范畴。 3.2.2 分布计算(Distributed Computing) 另一类计算刚好和高吞吐计算相反,它们虽然可以给分成若干并行的子任务,但是子任务间联系很紧密,需要大量的数据交换。按照Flynn的分类,分布式的高性能计算属于MIMD(Multiple Instruction/Multiple Data)的范畴。 3.3 Linux高性能集群系统 当论及Linux高性能集群时,许多人的第一反映就是Beowulf。起初,Beowulf只是一个著名的科学计算集群系统。以后的很多集群都采用Beowulf类似的架构,所以,实际上,现在Beowulf已经成为一类广为接受的高性能集群的类型。尽管名称各异,很多集群系统都是Beowulf集群的衍生物。当然也存在有别于Beowulf的集群系统,COW和Mosix就是另两类著名的集群系统。 4、《Linux企业集群》一书第13章的翻译 原文:Chapter 13: The LVS-DR Cluster Overview The Linux Virtual Server Direct Routing (LVS-DR) cluster is made possible by configuring all nodes in the cluster and the Director with the same VIP address; despite having this common address, though, client computers will only send their packets to the Director. The Director can, therefore, balance the incoming workload from the client computers by using one of the LVS scheduling methods we looked at in Chapter 12. The LVS-DR cluster configuration that we use in this chapter assumes that the Director is a computer dedicated to this task. In Chapter 14 we'll take a closer look at what is going on inside this computer, and in Chapter 15 we'll see how the load-balancing resource[1] can be placed on a real server and made highly available using the Heartbeat package. But before we build an enterprise-class, highly available LVS-DR cluster (in Chapter 15), let's examine how the LVS-DR forwarding method works in more detail. [1]Recall from our discussion of Heartbeat in Chapter 6 that a service, along with its associated IP address, is known as a resource. Thus, the virtual services offered by a Linux Virtual Server and their associated VIPs can also be called resources in high-availability terminology. How Client Computers Access LVS-DR Cluster Services Let's examine the TCP network communication that takes place between a client computer and the cluster. As with the LVS-NAT cluster network communication described in Chapter 12, the LVS-DR TCP communication starts when the client computer sends a request for a service running on the cluster, as shown in Figure 13-1.[2] Figure 13-1: In packet 1 the client sends a request to the LVS-DR cluster The first packet, shown in Figure 13-1, is sent from the client computer to the VIP address. Its data payload is an HTTP request for a web page. Note  An LVS-DR cluster, like an LVS-NAT cluster, can use multiple virtual IP (VIP) addresses, so we'll number them for the sake of clarity. Before we focus on the packet exchange, there are a couple of things to note about Figure 13-1. The first is that the network interface card (NIC) the Director uses for network communication (the box labeled VIP1 connected to the Director in Figure 13-1) is connected to the same physical network that is used by the cluster node and the client computer. The VIP the RIP and the CIP, in other words, are all on the same physical network (the same network segment or VLAN). Note  You can have multiple NICs on the Director to connect the Director to multiple VLANs. The second is that VIP1 is shown in two places in Figure 13-1: it is in a box representing the NIC that connects the Director to the network, and it is in a box that is inside of real server 1. The box inside of real server 1 represents an IP address that has been placed on the loopback device on real server 1. (Recall that a loopback device is a logical network device used by all networked computers to deliver packets locally.) Network packets[3] that are routed inside the kernel on real server 1 with a destination address of VIP1 will be sent to the loopback device on real server 1—in other words, any packets found inside the kernel on real server 1 with a destination address of VIP1 will be delivered to the daemons running locally on real server 1. (We'll see how packets that have a destination address of VIP1 end up inside the kernel on real server 1 shortly.) Now let's look at the packet depicted in Figure 13-1. This packet was created by the client computer and sent to the Director. A technical detail not shown in the figure is the lower-level destination MAC address inside this packet. It is set to the MAC address of the Director's NIC that has VIP1 associated with it, and the client computer discovered this MAC address using the Address Resolution Protocol (ARP). An ARP broadcast from the client computer asked, "Who owns VIP1?" and the Director replied to the broadcast using its MAC address and said that it was the owner. The client computer then constructed the first packet of the network conversation and inserted the proper destination MAC address to send the packet to the Director. (We'll examine a broadcast ARP request and see how they can create problems in an LVS-DR cluster environment later in this chapter.) Note  When the cluster is connected to the Internet, and the client computer is connected to the cluster over the Internet, the client computer will not send an ARP broadcast to locate the MAC address of the VIP. Instead, when the client computer wants to connect to the cluster, it sends packet 1 over the Internet, and when the packet arrives at the router that connects the cluster to the Internet, the router sends the ARP broadcast to find the correct MAC address to use. When packet 1 arrives at the Director, the Director forwards the packet to the real server, leaving the source and destination addresses unchanged, as shown in Figure 13-2. Only the MAC address is changed from the Director's MAC address to the real server's (RIP) MAC address. Figure 13-2: In packet 2 the Director forwards the client computer's request to a cluster node Notice in Figure 13-2 that the source and destination IP address have not changed in packet 2: CIP1 is still the source address, and VIP1 is still the destination address. The Director, however, has changed the destination MAC address of the packet to that of the NIC on real server 1 in order to send the packet into the kernel on real server 1 (though the MAC addresses aren't shown in the figure). When the packet reaches real server 1, the packet is routed to the loopback device, because that's where the routing table inside the kernel on real server 1 is configured to send it. (In Figure 13-2, the box inside of real server 1 with the VIP1 address in it depicts the VIP1 address on the loopback device.) The packet is then received by a daemon running locally on real server 1 listening on VIP1, and that daemon knows what to do with the packet—the daemon is the Apache HTTPd web server in this case. The HTTPd daemon then prepares a reply packet and sends it back out through the RIP1 interface with the source address set to VIP1, as shown in Figure 13-3. Figure 13-3: In packet 3 the cluster node sends a reply back through the Director The packet shown in Figure 13-3 does not go back through the Director, because the real servers do not use the Director as their default gateway in an LVS-DR cluster. Packet 3 is sent directly back to the client computer (hence the name direct routing). Also notice that the source address is VIP1, which real server 1 took from the destination address it found in the inbound packet (packet 2). Notice the following points about this exchange of packets: The Director must receive all the inbound packets destined for the cluster. The Director only receives inbound cluster communications (requests for services from client computers). Real servers, the Director, and client computers can all share the same network segment. The real servers use the router on the production network as their default gateway (unless you are using the LVS martian patch on your Director). If client computers will always be on the same network segment as the cluster nodes, you do not need to configure a default gateway for the real servers.[4] [2]We are ignoring the lower-level TCP connection request (the TCP handshake) in this discussion for the sake of simplicity. [3]When the kernel holds a packet in memory it places the kernel into an area of memory that is references with a pointer called a socket buffer or sk_buff, so, to be completely accurate in this discussion I should use the term sk_buff instead of packet every time I mention a packet inside the director. [4]This, however, would be an unusual configuration, because real servers will likely need to access both an email server and a DNS server residing on a different network segment. ARP Broadcasts and the LVS-DR Cluster As we've just seen, placing VIP addresses on the loopback (lo) device on each cluster node allows the cluster nodes in an LVS-DR cluster to accept packets that are destined for the VIP address. However, this has one dangerous side effect: the real servers inside the cluster will try to reply to ARP broadcasts from client computers that are looking for the VIP. Unless special precautions are taken, the real servers will claim to own the VIP address, and client computers will send their packets directly to real servers, thus circumventing the cluster load-balancing method and destroying the integrity of network communication with the Director (where packets that use the VIP as their destination address are supposed to go). To understand this problem (called "The ARP Problem" in the LVS- HOWTO), let's look at how a client computer uses the VIP address to find the correct MAC address by using ARP. Client Computers and ARP Broadcasts Figure 13-4 shows a client computer sending an ARP broadcast to an LVS-DR cluster. Notice that because the Director and the cluster node (real server 1) are connected to the same network, they will both receive the ARP broadcast asking, "Who owns VIP1?" Figure 13-4: An ARP broadcast to an LVS-DR cluster In Figure 13-4, gray arrows represent the path taken by an ARP broadcast sent from the client computer. The ARP broadcast packet is sent to all nodes connected to the local network (the VLAN or physical network segment), so a gray arrow is shown on the physical wires that connect the Director and real server 1 to the network switch. This is normal network behavior. However, we want real server 1 to ignore this ARP request and only the LVS-DR Director to respond to it, as shown in Figure 13-5. In the figure, a gray arrow depicts the path of the ARP reply. It should only come from the Director and not real server 1. Figure 13-5: An ARP response from the LVS-DR Director To prevent real servers from replying to ARP broadcasts for the LVS-DR cluster VIP, we need to hide the loopback interface on all of the real servers. Several techniques are available to accomplish this, and they are described in the LVS-HOWTO. Note  Starting with Kernel version 2.4.26, the stock Linux kernel contains the code necessary to prevent real servers from replying to ARP broadcasts. This is discussed in Chapter 15. In Conclusion We've examined the LVS-DR forwarding method in detail in this chapter, and we examined a sample LVS-DR network conversation between a client computer and a cluster node. We've also briefly described a potential problem with ARP broadcasts when you build an LVS-DR cluster. In Chapter 15, we will see how to build a high-availability, enterprise-class LVS-DR cluster. Before we do so, though, Chapter 14 will look inside the load balancer. 第13章:LVS-DR集群 摘要 Linux虚拟服务器直接路由(LVS-DR)集群是通过配置所有集群节点和Director使用相同的VIP地址实现的,尽管有这个公用的地址,客户端计算机仍然只会将它们的数据包发送给Director。 使用第12章中描述了任何一个LVS调度方法,Director可以平衡来自客户端计算机的入站请求负载。 本章中使用的LVS-DR集群配置假设Director是一台专用的计算机,在第14章中,我们会近距离地看到这台计算机上发生了什么,在第15章中,我们会看到负载均衡资源[1]是如何被放到真实服务器,以及如何使用Heartbeat软件包实现高可用的。 但是在建立一个企业级,高可用的LVS-DR集群之前,让我们先仔细地检查一下LVS-DR转发方法是如何工作的。 回忆第6章中关于Heartbeat的讨论,一个服务和它关联的ip地址一起就是一个资源,因此,Linux虚拟服务器的提供的虚拟服务以及它们关联的VIP在高可用术语中也可以叫做资源。 如图13-1所示,来自客户端计算机的第一个数据包被发送到VIP地址,它的数据有效负载是一个web页面的HTTP请求。 注意:与LVS-NAT集群类似,LVS-DR集群也可以使用多个虚拟ip地址(VIP),为了达得更清楚我们会用数字标识它们。 在将注意力转移到数据包交换之前,关于图13-1有两件事情需要注意,第一件是Director使用的网卡(标记为VIP1)同时连接了集群节点和客户端计算机,即VIP,RIP和CIP都在相同的网络中(相同的网段或VLAN)。 注意:你可以在Director上安装多块网卡,让它们分别连接到多个VLAN。 第二件是VIP1在图13-1中有两个地方都显示了:在代表网卡的方框里的VIP1和在真实服务器1中的方框中的VIP1。服务器1内的方框表示服务器1上loopback设备上的一个ip地址(loopback设备是一个逻辑网络设备,所有网络计算机都使用它传输发送到本地的数据包),在真实服务器1的内核中使用VIP1作为目标地址路由的网络数据包[3]将被发送到真实服务器1的loopback设备,即在真实服务器1的内核中发现的任何目标地址为VIP1的数据包都会被传递给真实服务器1上本地运行的守护进程(我们马上会看到目标地址为VIP1的数据包在内核内是如何结束的)。 现在让我们看一下图13-1中描述的数据包,这个数据包是由客户端计算机创建的,它发送给Director,在图中有一个技术细节没有显示出来,就是数据包内的低层目标MAC地址,它是Director 绑定VIP1的网卡的MAC地址,客户端计算机使用地址解析协议(ARP)发现这个MAC地址。 来自客户端计算机的ARP广播问:“VIP1是谁的?”,Director用它的MAC地址回答广播,并说VIP1是它的,客户端计算机开始构造网络会话的第一个数据包,并插入正确的目标MAC地址,然后将数据包发送给Director。(我们将在本章的后面检查一个ARP广播请求,查看它在LVS-DR集群环境中是如何创建问题的) 注意:当集群连接到因特网时,客户端计算机也是通过因特网访问集群的,此时,客户端计算机不会发送ARP广播定位VIP的MAC地址,相反,在客户端计算机想连接到集群时,它通过因特网发送数据包1,当数据包抵达集群连接到因特网的路由器时,路由器发送ARP广播查找正确的MAC地址。 当数据包1抵达Director时,Director转发数据包给真实服务器,保留源地址和目标地址不变,如图13-2所示,只是将Director的MAC地址变为真实服务器(RIP)的MAC地址了。 注意图13-2中数据包2的源地址和目标地址没有发生变化:CIP1仍然是源地址,VIP1仍然是目标地址。但是,Director将数据包的目标MAC地址改为真实服务器1的MAC地址了,主要是为了让数据包进入真实服务器1的内核(尽管图中没有显示MAC地址),当数据包抵达真实服务器1时,它被路由到loopback设备,因为真实服务器1的内核路由表被配置为这样发送,(在图13-2中,真实服务器1内的方框VIP1地址表示loopback设备上的VIP1地址),然后,数据包被真实服务器1上本地运行的监听VIP1的守护进程接收,这个守护进程知道数据包要干什么 -- 本例中该守护进程就是Apache HTTPd守护进程。 然后,HTTPd守护进程准备一个应答数据包,通过RIP1接口发回源地址VIP1,如图13-3所示: 图13-3中显示的数据包不会通过Director返回,因为LVS-DR集群内的真实服务器的默认网关不是Director,数据包3直接返回给客户端计算机(因此名字叫做直接路由),同时注意源地址是VIP1,它是真实服务器1从入站数据包(数据包2)的目标地址取得的。 注意下面的关于数据包交换的要点: Director必须接收所有到达集群的入站数据包。 13.2 ARP广播和LVS-DR集群 正如我们前面看到的,将VIP地址放在每个集群节点的loopback(lo)设置上允许LVS-DR集群节点接受到VIP地址的数据包,然而,这样做有危险的一面:集群内的真实服务器会尝试回答来自客户端计算机的查找VIP的ARP广播,除非专门做了预防,真实服务器将会申请认领VIP地址,客户端计算机将会直接把数据包发给真实服务器,因此,围绕集群负载调度方法,破坏与Director之间的网络通讯内容的完整性(使用VIP作为它们的目标地址的数据包)。 为了理解这个问题(在LVS HOWTO中叫做“ARP问题”),让我们看一下客户端计算机是如何使用VIP地址通过ARP协议查找正确的MAC地址的。 13.3 客户端计算机和ARP广播 图13-4显示了一个客户端计算机发送一个ARP广播到LVS-DR集群,注意因为Director和集群节点(真实服务器1)都是连接到相同的网络上的,它们都会接收到ARP广播“是谁的VIP1?” 在图13-4中,灰色箭头代表来自客户端计算机的ARP广播传输路径,ARP广播数据包发送到所有连接到本地网络的节点上(VLAN或物理网段),因此在连接Director和真实服务器1到交换机的线路上都有一个灰色箭头,这是正常的网络行为。 然而,我们想让真实服务器1忽略掉ARP广播,只让LVS-DR Director响应它,如图13-4所示,在这个图中,灰色箭头描述了ARP应答的路径,它应该只来自Director而没有真实服务器1。 为了阻止真实服务器应答ARP广播,我们需要在所有真实服务器上隐藏loopback接口,有几种技术可以实现这个要求,它们 都在LVS HOWTO中有描述。 注意:从2.4.26内核开始,主干Linux内核包括了阻止真实服务器应答ARP广播的代码,这将会在第15中讨论。 13.4 本章小结 在这一章,我们仔细地检验了LVS-DR转发方法,同时检查了客户端计算机与集群节点之间的网络会话,还简要地讨论了一下在建立LVS-DR集群时使用ARP广播可能带来的问题。 在第15章中,我们将会看到如何建立一个高可用、企业级的LVS-DR集群,在此之前,我们先要进入第14章看一下负载调度器的内幕。
/
本文档为【linux企业集群_bin】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索