=?GB2312?B?tPC4tDogW3RvcnF1ZXVzZXJzXSBzdGFydCBpbnRlbCBtcGkgaQ==?= =?GB2312?B?biBwYnM=?=

=?GB2312?B?RGFuaWUibCBCb29uZQ==?= daniel.boone at kahosl.be
Wed Jun 27 04:56:25 MDT 2007


Add the name of the server to the nodes file and start a pbs_mom on the
server

Chaucer Cao schreef:
>
> Hi Donald,
>
> How to enable the admin node as a computer node in torque? Many thanks!
>
> Best wishes,
>
> Chaucer
>
> ------------------------------------------------------------------------
>
> *·¢¼þÈË:* Donald Tripp [mailto:dtripp at hawaii.edu]
> *·¢ËÍʱ¼ä:* 2007Äê6ÔÂ27ÈÕ 16:51
> *ÊÕ¼þÈË:* Chaucer Cao
> *³­ËÍ:* torqueusers at supercluster.org; 'Krause, Roland'
> *Ö÷Ìâ:* Re: [torqueusers] start intel mpi in pbs
>
> It looks like torque is configured not to run jobs on the admin / main
> node (in this case, "cluster"). Thats why you get only 3 hosts
> available, because it will launch mpd on the c0-0, c0-1, and c0-2
> nodes, but not cluster.
>
> By default, torque is setup not to allow jobs to run on the admin /
> main node, but this can be enabled, and will have to be in your case.
>
> - Donald Tripp
>
> dtripp at hawaii.edu <mailto:dtripp at hawaii.edu>
>
> ----------------------------------------------
>
> HPC Systems Administrator
>
> High Performance Computing Center
>
> University of Hawai'i at Hilo
>
> 200 W. Kawili Street
>
> Hilo, Hawaii 96720
>
> http://www.hpc.uhh.hawaii.edu <http://www.hpc.uhh.hawaii.edu/>
>
>
>
> On Jun 26, 2007, at 10:38 PM, Chaucer Cao wrote:
>
>
>
> Hi all,
>
> In the pbs script file I can¡¯t start the mpd (intel mpi ) useing the
> following command
>
> ****************************************************************************
>
> mpdboot --rsh=ssh -v -n `cat mpd.hosts|wc -l` -f mpd.hosts
>
> ****************************************************************************
>
> It gives:
>
> --------------------------------------------------------------------------------------------------
>
> totalnum=4 numhosts=3
>
> there are not enough hosts on which to start all processes
>
> --------------------------------------------------------------------------------------------------
>
> But I can manually start mpd using the same command.
>
> -------------------------------------------------------------------------------------------------
>
> [mpp at cluster std]$ mpdboot --rsh=ssh -v -n 4 -f mpd.hosts
>
> running mpdallexit on cluster
>
> LAUNCHED mpd on cluster via
>
> RUNNING: mpd on cluster
>
> LAUNCHED mpd on c0-0 via cluster
>
> LAUNCHED mpd on c0-1 via cluster
>
> LAUNCHED mpd on c0-2 via cluster
>
> RUNNING: mpd on c0-0
>
> RUNNING: mpd on c0-1
>
> RUNNING: mpd on c0-2
>
> -------------------------------------------------------------------------------------------------
>
> Does any one know how to fix? Many thanks!
>
> Best wishes,
>
> Chaucer
>
> ------------------------------------------------------------------------
>
> *·¢¼þÈË:* Chaucer Cao [mailto:ccao at sgi.com]
> *·¢ËÍʱ¼ä:* 2007Äê6ÔÂ26ÈÕ 14:12
> *ÊÕ¼þÈË:* 'Krause, Roland'
> *Ö÷Ìâ:* ´ð¸´: [torqueusers] how to get Environment Variables
>
> Hi Roland,
>
> Maybe the pbsnodes give the ntype cluster info. You :
>
> c0-2
>
> state = free
>
> np = 4
>
> ntype = cluster
>
> status = opsys=linux,uname=Linux *compute-0-2.local*
> 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006
> x86_64,sessions=14316,nsessions=1,nusers=1,idletime=105210,totmem=5045676kb,availmem=4608468kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=483100398328,state=free,jobs=,varattr=,rectime=1182836318
>
> c0-1
>
> state = free
>
> np = 4
>
> ntype = cluster
>
> status = opsys=linux,uname=Linux *compute-0-1.local*
> 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006
> x86_64,sessions=26709,nsessions=1,nusers=1,idletime=234995,totmem=5045672kb,availmem=4592532kb,physmem=4025556kb,ncpus=4,loadave=4.00,netload=697953068235,state=free,jobs=,varattr=,rectime=1182836316
>
> c0-0
>
> state = free
>
> np = 4
>
> ntype = cluster
>
> status = opsys=linux,uname=Linux *compute-0-0.local*
> 2.6.9-42.0.2.ELsmp #1 SMP Wed Aug 23 13:38:27 BST 2006
> x86_64,sessions=28348,nsessions=1,nusers=1,idletime=220618,totmem=5045676kb,availmem=4557852kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=588068945521,state=free,jobs=,varattr=,rectime=1182836318
>
> cluster
>
> state = free
>
> np = 4
>
> ntype = cluster
>
> status = opsys=linux,uname=Linux *cluster.hpc.org* 2.6.9-42.0.2.ELsmp
> #1 SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=2993 24894 25052
> 25158
> 25307,nsessions=5,nusers=3,idletime=92734,totmem=5045676kb,availmem=4130016kb,physmem=4025560kb,ncpus=4,loadave=4.48,netload=678702222035,state=free,jobs=,varattr=,rectime=1182836315
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> It seems the head node get the different domain. In the /etc/hosts
>
> #
>
> # Do NOT Edit (generated by dbreport)
>
> #
>
> 127.0.0.1 localhost.localdomain localhost
>
> 10.1.1.1 cluster.local cluster # originally frontend-0-0
>
> 10.255.255.254 compute-0-0.local compute-0-0 c0-0
>
> 10.255.255.253 compute-0-1.local compute-0-1 c0-1
>
> 10.255.255.252 compute-0-2.local compute-0-2 c0-2
>
> 192.168.1.1 cluster.hpc.org
>
> But I don¡¯t how tell the pbs_server he should use the cluster.local. J
> thanks!
>
> Best wishes,
>
> Chaucer
>
> ------------------------------------------------------------------------
>
> *·¢¼þÈË:* Krause, Roland [mailto:Roland.Krause at amtc-dresden.com]
> *·¢ËÍʱ¼ä:* 2007Äê6ÔÂ25ÈÕ 19:37
> *ÊÕ¼þÈË:* Chaucer Cao
> *Ö÷Ìâ:* RE: [torqueusers] how to get Environment Variables
>
> Hi Chaucer,
>
> beside our production system we have a test system with two nodes. One
> of them is server,
>
> but I can run jobs with qsub -l nodes=2.
>
> Do all your nodes have the "ntype" "cluster"?
>
> Regards,
>
> Roland
>
>     ------------------------------------------------------------------------
>
>     *From:* Chaucer Cao [mailto:ccao at sgi.com]
>     *Sent:* Monday, June 25, 2007 10:33 AM
>     *To:* Krause, Roland
>     *Subject:* ??: [torqueusers] how to get Environment Variables
>
>     Hi Roland,
>
>     The Environment variables problem is OK now. but I encounter
>     another problem:
>
>     There are four nodes including the head node. But I only can
>     submit 3-node job by qsub. When I submit a 4-node job it gives:
>
>     c0-0
>
>     c0-1
>
>     c0-2
>
>     cluster
>
>     totalnum=4 numhosts=3
>
>     there are not enough hosts on which to start all processes
>
>     1. no mpd is running on this host
>
>     2. an mpd is running but was started without a "console" (-n option)
>
>     mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_ccao);
>     possible causes:
>
>     mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_ccao);
>     possible causes:
>
>     1. no mpd is running on this host
>
>     2. an mpd is running but was started without a "console" (-n option)
>
>     It seems I can¡¯t run the job on head node(cluster) with pbs. But I
>     can run 4-node job directly (without qsub).
>
>     When I use pbsnodes to check it seems all nodes are in free
>     status. Can you help me on this? Many thanks!
>
>     Best wishes,
>
>     Chaucer
>
>     ------------------------------------------------------------------------
>
>     *·¢¼þÈË:* Krause, Roland [mailto:Roland.Krause at amtc-dresden.com]
>     *·¢ËÍʱ¼ä:* 2007Äê6ÔÂ25ÈÕ 15:13
>     *ÊÕ¼þÈË:* Chaucer Cao
>     *Ö÷Ìâ:* RE: [torqueusers] how to get Environment Variables
>
>     Hi Chaucer,
>
>     Could you provide the part of your script, which is reading PBS
>     env variables?
>
>     Regards,
>
>     Roland
>
>         ------------------------------------------------------------------------
>
>         *From:* torqueusers-bounces at supercluster.org
>         [mailto:torqueusers-bounces at supercluster.org] *On Behalf Of
>         *Chaucer Cao
>         *Sent:* Wednesday, June 20, 2007 7:16 PM
>         *To:* torqueusers at supercluster.org
>         <mailto:torqueusers at supercluster.org>
>         *Subject:* [torqueusers] how to get Environment Variables
>
>         Hi all,
>
>         Does any one know how can I get the the PBS environment
>         variables in the run script file. When I qsub my script file
>         it gives:
>
>         PBS_NODEFILE: Undefined variable.
>
>         PBS_ENVIRONMENT: Undefined variable.
>
>         Many thanks!
>
>         Chaucer
>
> _______________________________________________
>
> torqueusers mailing list
>
> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>   


More information about the torqueusers mailing list