为了正常的体验网站,请在浏览器设置里面开启Javascript功能!
首页 > sge安装及使用

sge安装及使用

2020-03-01 4页 pdf 346KB 34阅读

用户头像 个人认证

冬梅

教育工作者

举报
sge安装及使用SGE安装及使用文档崔再续2011-8-21目录1NFS的搭建................................................................................................................11.1NFS简介.................................................................................................................1...
sge安装及使用
SGE安装及使用文档崔再续2011-8-21目录1NFS的搭建................................................................................................................11.1NFS简介.................................................................................................................11.2我们的需求............................................................................................................12SGE的搭建...............................................................................................................22.1SGE简介................................................................................................................22.2我们的需求............................................................................................................22.3SGE软件及....................................................................................................22.4安装压缩包sge62u5_linux24-i586_rpm.zip.......................................................32.5SGE集群规划........................................................................................................32.6安装主控主机上的主控进程................................................................................32.7主控节点安装中的相关问题..............................................................................202.8安装执行进程......................................................................................................202.9执行节点安装中的相关问题..............................................................................262.10启动sge进程.....................................................................................................272.11安装gridengine-client.......................................................................................272.12sge的使用...........................................................................................................282.13使用命令对作业和队列进行管理....................................................................302.14主机的状态........................................................................................................322.15作业的状态........................................................................................................323SGE与NFS用户管理问题....................................................................................324向集群中再添加一个执行节点.............................................................................335附注.........................................................................................................................3411NFS的搭建NFS的全称是NetworkFileSystem,即网络文件系统。1.1NFS简介NFS允许一个系统在网络上与他人共享目录和文件,通过NFS,用户和程序可以像访问本地文件一样访问远端系统上的文件。NFS至少有两个主要部分:一台服务器和一台(或者更多)客户机。客户机远程访问存放在服务器上的数据。1.2我们的需求我们需要磁盘容量共享,每个机子分出几百G的空间,共享出来,大家一起用。采用NFS可以简单地实现这个目的。每台机子根目录都建一个/data目录。/data下面建立四个目录,分别为/data/master、/data/node1、/data/Node2、/data/node3四个目录。master将500多G的空间挂载到自己的/data/master下,然后通过nfs共享到172.16.192.0网络中。node1、Node2、node3通过nfs将master的/data/master目录加载到自己的/data/master目录下面。同时,node1将自己的/data/node1共享出来,Node2将自己的/data/Node2共享出来,node3将自己的/data/node3共享出来。下面是具体的设置(以共享master的/data/master为例):master端:1)sudoaptitudeinstallnfs-kernel-server2)编辑/etc/exports在文件末尾添加上下面这一行:/home/master172.16.192.0/24(rw,no_root_squash)3)重启nfs/etc/init.d/portmapstop/etc/init.d/portmapstart/etc/init.d/nfs-kernel-serverrestart客户端(node1,Node2,node3)1)sudoapt-getinstallportmapnfs-common2)编辑/etc/fstab在末尾添加上下面这一行:172.16.192.204:/data/master/home/data/nfsdefaults003)加载sudomount172.16.192.204:/data/master2说明:上面是的服务器-客户端的安装方法。在我们这里,每个机子都要贡献出自己的一部分硬盘,也都要共享别人的硬盘。即,我们这里的每台机子都既是服务器又是客户端。其实,在安装nfs-kernel-server的时候,会同时安装portmap和nfs-common,因为nfs-kernel-server依赖于它们。所以,我们需要每台机子上都安装nfs-kernel-server。在每台机子上都执行sudoaptitudeinstallnfs-kernel-server即可。2SGE的搭建SGE的全称是SunGridEngine。现在已经改名为OracleGridEngine。2.1SGE简介OracleGridEngine是一种分布式资源管理系统,它可以将用户的负载分发到可用的计算资源上。一般地,在一个典型的数据中心,计算资源的利用率平均只有10%-25%,OracleGridEngine可以将计算资源的利用率增加到80%,90%甚至95%。这显著的改善来自于智能地将负载分发给最合适的计算资源。当用户将它们的任务以一批作业的形式提交给OracleGridEngine的时候,软件监控着集群中所有资源的当前状态并且可以赋予这些作业最适合的资源。2.2我们的需求现在要搭一个集群,该集群只需要完成作业的合理调度即可。比如现在5台机子,每台机子8个核,则共有40个核。现在,我从其中一台一台机子上提交了1000个作业,系统将自动将这1000个作业分配给这40个核来做。而且,只要有一个空闲的核,系统将再给它一个作业。当然,某一个作业仍然是串行的。该集群可以使用OrcaleGridEngine完成2.3SGE软件及资料1)SGE的下载地址:http://www.oracle.com/us/products/tools/oracle-grid-engine-075549.html该网址有SGE的各种版本,在各种Linux平台下的32位版本及64位版本。我的系统是32位的Ubuntu11.04桌面版,使用的版本是是sge62u5_linux24-i586_rpm.zip将该包放在任意目录下都行2)关于OracleGridEngine软件的安装及使用,网上有《N1GridEngine6用户指南》《N1GridEngine6安装指南》《N1GridEngine6管理指南》三本书。中文版本的从csdn资源上面都可以找到,也可以自行下载。参照这三本书进行SGE的安装及使用。32.4安装压缩包sge62u5_linux24-i586_rpm.zipunzipsge62u5_linux24-i586_rpm.zip会生成文件夹sge6_2u5,文件夹中有两个rpm文件cdsge6_2u5alien--scriptssun-sge-bin-linux24-i586-6.2-5.i386.rpmalien--scriptssun-sge-common-6.2-5.noarch.rpm如此,将生成两个.deb文件sun-sge-bin-linux24-i586_6.2-6_i386.debsun-sge-common_6.2-6_all.deb安装这两个文件dpkg-isun-sge-bin-linux24-i586_6.2-6_i386.debdpkg-isun-sge-bin-linux24-i586_6.2-6_i386.deb这两条命令执行完,将在根目录生成/gridware这里包含SGE主控进程、执行进程等安装所需要的所有配置文件。2.5SGE集群规划1)集群中暂有四台主机,master、node1、node2、node3master:主控主机、执行主机、提交主机、管理主机node1:执行主机、提交主机、管理主机node2:执行主机、提交主机、管理主机node3:执行主机、提交主机、管理主机2)修改每台主机的/etc/hosts该文件最上面有几行127.0.0.1的,全部删除。在最上面加入以下内容:172.16.192.204master.bnu.edu.cnmaster172.16.192.109node1.bnu.edu.cnnode1172.16.192.220node2.bnu.edu.cnnode2172.16.192.91node3.bnu.edu.cnnode32.6安装主控主机上的主控进程1)一定要在root用户下安装2)添加sgeadmin用户(sudouseraddsgeadmin)在安装主控进程时,需要添加一个非root的管理者,我们一般设为sgeadmin,因而本机必须有sgeadmin用户3)编辑/gridware/sge/util/arch文件这是sge软件的系统文件,但存在语法问题,如果不修改,在运行/gridware/sge/default/settings.sh脚本文件的时候会报错,错误如下:[:329:12:unexpectedoperator4[:329:12:unexpectedoperator打开/gridware/sge/util/arch文件,找到下面这一段:if[$?-ne0];thenunsupported="UNSUPPORTED-"lxrelease="${lxrelease}-GLIBC"elselibc_version=`echo$libc_string|tr',''\n'|grep"2\."|cut-f2-d"."`if[$libc_version-lt2];thenunsupported="UNSUPPORTED-"lxrelease=24-GLIBC-2.${libc_version}在libc_version=`echo$libc_string|tr',''\n'|grep"2\."|cut-f2-d"."`与if[$libc_version-lt2];then这两行之间,加上libc_version=12具体的原因,我的博客中讲得非常清楚,在此就不详述了附上地址:http://hi.baidu.com/heart_eternal/blog/item/8efc2c07b6792dfd37d12219.html4)主控进程安装全过程cd/gridware/sge./install_qmaster---------------------------------------GridEngineqmasterhostinstallation-------------------------------------Beforeyoucontinuewiththeinstallationpleasereadthesehints:-Yourterminalwindowshouldhaveasizeofatleast80x24characters-TheINTRcharacterisoftenboundtothekeyCtrl-C.Theterm>Ctrl-C<isusedduringtheinstallationifyouhavethepossibilitytoaborttheinstallationTheqmasterinstallationprocedurewilltakeapproximately5-10minutes.5Hit<RETURN>tocontinue>>回车ChoosingGridEngineadminuseraccount---------------------------------------YoumayinstallGridEnginethatallfilesarecreatedwiththeuseridofanunprivilegeduser.ThiswillmakeitpossibletoinstallandrunGridEngineindirectorieswhereuser>root<hasnopermissionstocreateandwritefilesanddirectories.-GridEnginestillhastobestartedbyuser>root<-thisdirectoryshouldbeownedbytheGridEngineadministratorDoyouwanttoinstallGridEngineunderanuseridotherthan>root<(y/n)[y]>>回车(即选择默认的y)ChoosingaGridEngineadminusername--------------------------------------Pleaseenteravalidusername>>sgeadminInstallingGridEngineasadminuser>sgeadmin<Hit<RETURN>tocontinue>>回车Checking$SGE_ROOTdirectory----------------------------6TheGridEnginerootdirectoryisnotset!PleaseenteracorrectpathforSGE_ROOT.Ifthisdirectoryisnotcorrect(e.g.itmaycontainanautomounterprefix)enterthecorrectpathtothisdirectoryorhit<RETURN>tousedefault[/gridware/sge]>>回车Your$SGE_ROOTdirectory:/gridware/sgeHit<RETURN>tocontinue>>回车GridEngineTCP/IPcommunicationservice----------------------------------------Theportforsge_qmasteriscurrentlysetasservice.sge_qmasterservicesettoport6444Nowyouhavethepossibilitytoset/changethecommunicationportsbyusingthe>shellenvironment<oryoumayconfigureitviaanetworkservice,configuredinlocal>/etc/service<,>NIS<or>NIS+<,addinganentryintheformsge_qmaster<port_number>/tcptoyourservicesdatabaseandmakesuretouseanunusedportnumber.HowdoyouwanttoconfiguretheGridEnginecommunicationports?Usingthe>shellenvironment<:[1]Usinganetworkservicelike>/etc/service<,>NIS/NIS+<:[2](default:2)>>回车7GridEngineTCP/IPservice>sge_qmaster<----------------------------------------Usingtheservicesge_qmasterforcommunicationwithGridEngine.Hit<RETURN>tocontinue>>回车GridEngineTCP/IPcommunicationservice----------------------------------------Theportforsge_execdiscurrentlysetasservice.sge_execdservicesettoport6445Nowyouhavethepossibilitytoset/changethecommunicationportsbyusingthe>shellenvironment<oryoumayconfigureitviaanetworkservice,configuredinlocal>/etc/service<,>NIS<or>NIS+<,addinganentryintheformsge_execd<port_number>/tcptoyourservicesdatabaseandmakesuretouseanunusedportnumber.HowdoyouwanttoconfiguretheGridEnginecommunicationports?Usingthe>shellenvironment<:[1]Usinganetworkservicelike>/etc/service<,>NIS/NIS+<:[2](default:2)>>回车8GridEngineTCP/IPcommunicationservice-----------------------------------------Usingtheservicesge_execdforcommunicationwithGridEngine.Hit<RETURN>tocontinue>>回车GridEnginecells-----------------GridEnginesupportsmultiplecells.IfyouarenotplanningtorunmultipleGridEngineclustersorifyoudon'tknowyetwhatisaGridEnginecellitissafetokeepthedefaultcellnamedefaultIfyouwanttoinstallmultiplecellsyoucanenteracellnamenow.Theenvironmentvariable$SGE_CELL=<your_cell_name>willbesetforallfurtherGridEnginecommands.Entercellname[default]>>回车Usingcell>default<.Hit<RETURN>tocontinue>>回车9Uniqueclustername-------------------TheclusternameuniquelyidentifiesaspecificSunGridEnginecluster.Theclusternamemustbeuniquethroughoutyourorganization.ThenameisnotrelatedtotheSGEcell.Theclusternamemuststartwithaletter([A-Za-z]),followedbyletters,digits([0-9]),dashes(-)orunderscores(_).Enternewclusternameorhit<RETURN>tousedefault[p6444]>>BrainClustercreatingdirectory:/gridware/sge/default/commonYour$SGE_CLUSTER_NAME:BrainClusterHit<RETURN>tocontinue>>回车GridEngineqmasterspooldirectory-----------------------------------Theqmasterspooldirectoryistheplacewheretheqmasterdaemonstorestheconfigurationandthestateofthequeuingsystem.Theadminuser>sgeadmin<musthaveread/writeaccesstotheqmasterspooldirectory.Ifyouwillinstallshadowmasterhostsorifyouwanttobeabletostarttheqmasterdaemononotherhosts(seethecorrespondingsectionintheGridEngineInstallationandAdministrationManualfordetails)theaccountontheshadowmasterhostsalsoneedsread/writeaccesstothisdirectory.10Enteraqmasterspooldirectory[/gridware/sge/default/spool/qmaster]>>回车(此处按默认值)Usingqmasterspooldirectory>/gridware/sge/default/spool/qmaster<.Hit<RETURN>tocontinue>>回车WindowsExecutionHostSupport------------------------------AreyougoingtoinstallWindowsExecutionHosts?(y/n)[n]>>回车Verifyingandsettingfilepermissions--------------------------------------Didyouinstallthisversionwith>pkgadd<ordidyoualreadyverifyandsetthefilepermissionsofyourdistribution(enter:y)(y/n)[y]>>回车Wedonotverifyfilepermissions.Hit<RETURN>tocontinue>>回车SelectdefaultGridEnginehostnameresolvingmethod----------------------------------------------------AreallhostsofyourclusterinoneDNSdomain?Ifthisisthecasethehostnames>hostA<and>hostA.foo.com<wouldbetreatedasequal,becausetheDNSdomainname>foo.com<isignoredwhencomparinghostnames.AreallhostsofyourclusterinasingleDNSdomain(y/n)[y]>>回车Ignoringdomainnamewhencomparinghostnames.11Hit<RETURN>tocontinue>>回车GridEngineJMXMBeanserver----------------------------InordertousetheSGEInspectortheServiceDomainManager(SDM)SGEadapteryouneedtoconfigureaJMXserverinqmaster.QmasterwillthenloadaJavaVirtualMachinethroughasharedlibrary.NOTE:Java1.5orlaterisrequiredfortheJMXMBeanserver.DoyouwanttoenabletheJMXMBeanserver(y/n)[y]>>n(这个地方一定要选择n,不然后面会出错,我也不明白原因)Makingdirectories------------------creatingdirectory:/gridware/sge/default/spool/qmastercreatingdirectory:/gridware/sge/default/spool/qmaster/job_scriptsHit<RETURN>tocontinue>>回车Setupspooling--------------YourSGEbinariesarecompiledtolinkthespoolinglibrariesduringruntime(dynamically).SoyoucanchoosebetweenBerkeleyDBspoolingandClassicspoolingmethod.Pleasechooseaspoolingmethod(berkeleydb|classic)[berkeleydb]>>回车TheBerkeleyDBspoolingmethodprovidestwoconfigurations!Localspooling:TheBerkeleyDBspoolsintoalocaldirectoryonthishost(qmasterhost)Thissetupisfaster,butyoucan'tsetupashadowmasterhost12BerkeleyDBSpoolingServer:Ifyouwanttosetupashadowmasterhost,youneedtouseBerkeleyDBSpoolingServer!InthiscaseyouhavetochooseahostwithaconfiguredRPCservice.TheqmasterhostconnectsviaRPCtotheBerkeleyDB.Thissetupismorefailsafe,butresultsinaclearpotentialsecurityhole.RPCcommunication(asusedbyBerkeleyDB)canbeeasilycompromised.Pleaseonlyusethisalternativeifyoursiteissecureorifyouarenotconcernedaboutsecurity.Checktheinstallationguideforfurtheradviceonhowtoachievefailsafetywithoutcompromisingsecurity.DoyouwanttouseaBerkeleyDBSpoolingServer?(y/n)[n]>>回车Hit<RETURN>tocontinue>>回车BerkeleyDatabasespoolingparameters-------------------------------------Pleaseenterthedatabasedirectorynow,evenifyouwanttospoollocally,itisnecessarytoenterthisdatabasedirectory.Default:[/gridware/sge/default/spool/spooldb]>>回车creatingdirectory:/gridware/sge/default/spool/spooldbDumpingbootstrappinginformationInitializingspoolingdatabaseHit<RETURN>tocontinue>>回车GridEnginegroupidrange--------------------------WhenjobsarestartedunderthecontrolofGridEngineanadditionalgroupidissetonplatformswhichdonotsupportjobs.Thisisdonetoprovidemaximum13controlforGridEnginejobs.ThisadditionalUNIXgroupidrangemustbeunusedgroupid'sinyoursystem.Eachjobwillbeassignedauniqueidduringthetimeitisrunning.Thereforeyouneedtoprovidearangeofid'swhichwillbeassigneddynamicallyforjobs.TherangemustbebigenoughtoprovideenoughnumbersforthemaximumnumberofGridEnginejobsrunningatasinglemomentonasinglehost.E.g.arangelike>20000-20100<means,thatGridEnginewillusethegroupidsfrom20000-20100andprovidesarangefor100GridEnginejobsatthesametimeonasinglehost.Youcanchangeatanytimethegroupidrangeinyourclusterconfiguration.Pleaseenterarange[20000-20100]>>20000-21000Using>20000-21000<asgidrange.Hit<RETURN>tocontinue>>回车GridEngineclusterconfiguration---------------------------------PleasegivethebasicconfigurationparametersofyourGridEngineinstallation:<execd_spool_dir>Thepathnameofthespooldirectoryoftheexecutionhosts.User>sgeadmin<musthavetherighttocreatethisdirectoryandtowriteintoit.Default:[/gridware/sge/default/spool]>>回车GridEngineclusterconfiguration(continued)14---------------------------------------------<administrator_mail>Theemailaddressoftheadministratortowhomproblemreportsaresent.Itisrecommendedtoconfigurethisparameter.Youmayuse>none<ifyoudonotwishtoreceiveadministratormail.Pleaseenteranemailaddressintheform>user@foo.com<.Default:[none]>>(此处可输入一个邮箱地址,以后有相关错误信息,会发到邮箱里面,也可以不输入,直接回车)Thefollowingparametersfortheclusterconfigurationwereconfigured:execd_spool_dir/gridware/sge/default/spooladministrator_mailnoneDoyouwanttochangetheconfigurationparameters(y/n)[n]>>回车Creatinglocalconfiguration----------------------------Creating>act_qmaster<fileAddingdefaultcomplexattributesAddingdefaultparallelenvironments(PE)AddingSGEdefaultusersetsAdding>sge_aliases<pathaliasesfileAdding>qtask<qtcshsampledefaultrequestfileAdding>sge_request<defaultsubmitoptionsfileCreating>sgemaster<scriptCreating>sgeexecd<scriptCreatingsettingsfilesfor>.profile/.cshrc<15Hit<RETURN>tocontinue>>回车qmasterstartupscript----------------------Wecaninstallthestartupscriptthatwillstartqmasteratmachineboot(y/n)[y]>>回车GridEngineqmasterstartup---------------------------Startingqmasterdaemon.Pleasewait...startingsge_qmasterHit<RETURN>tocontinue>>回车AddingGridEnginehosts------------------------Pleasenowaddthelistofhosts,whereyouwilllaterinstallyourexecutiondaemons.Thesehostswillbealsoaddedasvalidsubmithosts.Pleaseenterablankseparatedlistofyourexecutionhosts.Youmaypress<RETURN>ifthelineisgettingtoolong.Onceyouarefinishedsimplypress<RETURN>withoutenteringaname.YoualsomayprepareafilewiththehostnamesofthemachineswhereyouplantoinstallGridEngine.ThismaybeconvenientifyouareinstallingGridEngineonmanyhosts.Doyouwanttouseafilewhichcontainsthelistofhosts(y/n)[n]>>回车Addingadminandsubmithosts-----------------------------16Pleaseenterablankseperatedlistofhosts.Stopbyentering<RETURN>.Youmayrepeatthisstepuntilyouareenteringanemptylist.YouwillseemessagesfromGridEnginewhenthehostsareadded.Host(s):masteradminhost"master.bnu.edu.cn"alreadyexistsmaster.bnu.edu.cnaddedtosubmithostlistHit<RETURN>tocontinue>>回车Host(s):node1node1.bnu.edu.cnaddedtoadminhostlistnode1.bnu.edu.cnaddedtosubmithostlistHit<RETURN>tocontinue>>回车Host(s):node2node2.bnu.edu.cnaddedtoadminhostlistnode2.bnu.edu.cnaddedtosubmithostlistHit<RETURN>tocontinue>>回车Host(s):node3node3.bnu.edu.cnaddedtoadminhostlistnode3.bnu.edu.cnaddedtosubmithostlistHit<RETURN>tocontinue>>回车Host(s):回车Finishedaddinghosts.Hit<RETURN>tocontinue>>回车AddingGridEngineshadowhosts-------------------------------Ifyouwanttouseashadowhost,itisrecommendedtoaddthishosttothelistofadministrativehosts.Ifyouarenotsure,itisalsopossibletoaddorremovehostsaftertheinstallationwith<qconf-ahhostname>foraddingand<qconf-dhhostname>forremovingthishost17Attention:Thisisnottheshadowhostinstallationprocedure.YoustillhavetoinstalltheshadowhostseparatelyDoyouwanttoaddyourshadowhost(s)now?(y/n)[y]>>nSchedulerTuning----------------Thedetailsonthedifferentoptionsaredescribedinthemanual.Configurations--------------1)NormalFixedintervalscheduling,reportlimitedschedulinginformation,actual+assumedload2)HighFixedintervalscheduling,reportlimitedschedulinginformation,actualload3)MaxImmediateScheduling,reportnoschedulinginformation,actualloadEnterthenumberofyourpreferredconfigurationandhit<RETURN>!Defaultconfigurationis[1]>>回车We'reconfiguringtheschedulerwith>Normal<settings!Doyouagree?(y/n)[y]>>回车UsingGridEngine-----------------18Youshouldnowenterthecommand:source/gridware/sge/default/common/settings.cshifyouareacsh/tcshuseror#./gridware/sge/default/common/settings.shifyouareash/kshuser.Thiswillsetorexpandthefollowingenvironmentvariables:-$SGE_ROOT(alwaysnecessary)-$SGE_CELL(ifyouareusingacellotherthan>default<)-$SGE_CLUSTER_NAME(alwaysnecessary)-$SGE_QMASTER_PORT(ifyouhaven'taddedtheservice>sge_qmaster<)-$SGE_EXECD_PORT(ifyouhaven'taddedtheservice>sge_execd<)-$PATH/$path(tofindtheGridEnginebinaries)-$MANPATH(toaccessthemanualpages)Hit<RETURN>toseewhereGridEnginelogsmessages>>回车GridEnginemessages--------------------GridEnginemessagescanbefoundat:/tmp/qmaster_messages(duringqmasterstartup)/tmp/execd_messages(duringexecutiondaemonstartup)Afterstartupthedaemonslogtheirmessagesintheirspooldirectories.19Qmaster:/gridware/sge/default/spool/qmaster/messagesExecdaemon:<execd_spool_dir>/<hostname>/messagesGridEnginestartupscripts---------------------------GridEnginestartupscriptscanbefoundat:/gridware/sge/default/common/sgemaster(qmaster)/gridware/sge/default/common/sgeexecd(execd)DoyouwanttoseepreviousscreenaboutusingGridEngineagain(y/n)[n]>>回车YourGridEngineqmasterinstallationisnowcompleted------------------------------------------------------Pleasenowlogintoallhostswhereyouwanttorunanexecutiondaemonandstarttheexecutionhostinstallationprocedure.Ifyouwanttorunanexecutiondaemononthishost,pleasedonotforgettomaketheexecutionhostinstallationinthishostaswell.Allexecutionhostsmustbeadministrativehostsduringtheinstallation.Allhostswhichyouaddedtothelistofadministrativehostsduringt
/
本文档为【sge安装及使用】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。
热门搜索

历史搜索

    清空历史搜索