=?gb2312?B?tPC4tDogW3RvcnF1ZXVzZXJzXSBzdGFydCBpbnRlbCBtcGkgaW4gcGJz?=

Chaucer Cao ccao at sgi.com
Wed Jun 27 04:09:34 MDT 2007


Hi Donald,

How to enable the admin node as a computer node in torque? Many thanks!

Best wishes,

Chaucer

 

 

  _____  

·¢¼þÈË: Donald Tripp [mailto:dtripp at hawaii.edu] 
·¢ËÍʱ¼ä: 2007Äê6ÔÂ27ÈÕ 16:51
ÊÕ¼þÈË: Chaucer Cao
³­ËÍ: torqueusers at supercluster.org; 'Krause, Roland'
Ö÷Ìâ: Re: [torqueusers] start intel mpi in pbs

 

It looks like torque is configured not to run jobs on the admin / main node
(in this case, "cluster").  Thats why you get only 3 hosts available,
because it will launch mpd on the c0-0, c0-1, and c0-2 nodes, but not
cluster. 

 

By default, torque is setup not to allow jobs to run on the admin / main
node, but this can be enabled, and will have to be in your case.

 

 

- Donald Tripp

   <mailto:dtripp at hawaii.edu> dtripp at hawaii.edu

----------------------------------------------

HPC Systems Administrator

High Performance Computing Center

University of Hawai'i at Hilo

200 W. Kawili Street

Hilo,   Hawaii   96720

 <http://www.hpc.uhh.hawaii.edu/> http://www.hpc.uhh.hawaii.edu





 

On Jun 26, 2007, at 10:38 PM, Chaucer Cao wrote:





Hi all,

In the pbs script file I can¡¯t start the mpd (intel mpi ) useing the
following command

****************************************************************************

mpdboot  --rsh=ssh -v -n `cat mpd.hosts|wc -l`  -f mpd.hosts

****************************************************************************

It gives:

----------------------------------------------------------------------------
----------------------

totalnum=4  numhosts=3

there are not enough hosts on which to start all processes

----------------------------------------------------------------------------
----------------------

But I can manually start mpd using the same command. 

----------------------------------------------------------------------------
---------------------

[mpp at cluster std]$  mpdboot --rsh=ssh -v -n 4 -f mpd.hosts

running mpdallexit on cluster

LAUNCHED mpd on cluster  via

RUNNING: mpd on cluster

LAUNCHED mpd on c0-0  via  cluster

LAUNCHED mpd on c0-1  via  cluster

LAUNCHED mpd on c0-2  via  cluster

RUNNING: mpd on c0-0

RUNNING: mpd on c0-1

RUNNING: mpd on c0-2

----------------------------------------------------------------------------
---------------------

 

Does any one know how to fix? Many thanks!

Best wishes,

Chaucer

 

  _____  

·¢¼þÈË: Chaucer Cao [mailto:ccao at sgi.com] 
·¢ËÍʱ¼ä: 2007Äê6ÔÂ26ÈÕ 14:12
ÊÕ¼þÈË: 'Krause, Roland'
Ö÷Ìâ: ´ð¸´: [torqueusers] how to get Environment Variables

 

Hi Roland,

Maybe the pbsnodes give the ntype cluster info. You :

c0-2

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux compute-0-2.local 2.6.9-42.0.2.ELsmp
#1 SMP Wed Aug 23 13:38:27 BST 2006
x86_64,sessions=14316,nsessions=1,nusers=1,idletime=105210,totmem=5045676kb,
availmem=4608468kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=4831003983
28,state=free,jobs=,varattr=,rectime=1182836318

 

c0-1

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux compute-0-1.local 2.6.9-42.0.2.ELsmp
#1 SMP Wed Aug 23 13:38:27 BST 2006
x86_64,sessions=26709,nsessions=1,nusers=1,idletime=234995,totmem=5045672kb,
availmem=4592532kb,physmem=4025556kb,ncpus=4,loadave=4.00,netload=6979530682
35,state=free,jobs=,varattr=,rectime=1182836316

 

c0-0

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux compute-0-0.local 2.6.9-42.0.2.ELsmp
#1 SMP Wed Aug 23 13:38:27 BST 2006
x86_64,sessions=28348,nsessions=1,nusers=1,idletime=220618,totmem=5045676kb,
availmem=4557852kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=5880689455
21,state=free,jobs=,varattr=,rectime=1182836318

 

cluster

     state = free

     np = 4

     ntype = cluster

     status = opsys=linux,uname=Linux cluster.hpc.org 2.6.9-42.0.2.ELsmp #1
SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=2993 24894 25052 25158
25307,nsessions=5,nusers=3,idletime=92734,totmem=5045676kb,availmem=4130016k
b,physmem=4025560kb,ncpus=4,loadave=4.48,netload=678702222035,state=free,job
s=,varattr=,rectime=1182836315

----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------

It seems the head node get the different domain. In the /etc/hosts

#

# Do NOT Edit (generated by dbreport)

#

127.0.0.1       localhost.localdomain   localhost

10.1.1.1        cluster.local cluster # originally frontend-0-0

10.255.255.254  compute-0-0.local compute-0-0 c0-0

10.255.255.253  compute-0-1.local compute-0-1 c0-1

10.255.255.252  compute-0-2.local compute-0-2 c0-2

192.168.1.1     cluster.hpc.org

But I don¡¯t how tell the pbs_server he should use the cluster.local. :-)
thanks!

Best wishes,

Chaucer

 

 

  _____  

·¢¼þÈË: Krause, Roland [mailto:Roland.Krause at amtc-dresden.com] 
·¢ËÍʱ¼ä: 2007Äê6ÔÂ25ÈÕ 19:37
ÊÕ¼þÈË: Chaucer Cao
Ö÷Ìâ: RE: [torqueusers] how to get Environment Variables

 

Hi Chaucer,

 

beside our production system we have a test system with two nodes. One of
them is server,

but I can run jobs with qsub -l nodes=2.

Do all your nodes have the "ntype" "cluster"?

 

Regards,

Roland

 


  _____  


From: Chaucer Cao [mailto:ccao at sgi.com] 
Sent: Monday, June 25, 2007 10:33 AM
To: Krause, Roland
Subject: ??: [torqueusers] how to get Environment Variables

Hi Roland,

The Environment variables problem is OK now. but I encounter another
problem:

There are four nodes including the head node. But I only can submit 3-node
job by qsub. When I submit a 4-node job it gives:

c0-0

c0-1

c0-2

cluster

totalnum=4  numhosts=3

there are not enough hosts on which to start all processes

  1. no mpd is running on this host

  2. an mpd is running but was started without a "console" (-n option)

mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_ccao); possible
causes:

mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_ccao); possible
causes:

  1. no mpd is running on this host

  2. an mpd is running but was started without a "console" (-n option)

It seems I can¡¯t run the job on head node(cluster) with pbs. But I can run
4-node job directly (without qsub). 

When I use pbsnodes to check it seems all nodes are in free status. Can you
help me on this? Many thanks!

Best wishes,

Chaucer

 

 


  _____  


·¢¼þÈË: Krause, Roland [mailto:Roland.Krause at amtc-dresden.com] 
·¢ËÍʱ¼ä: 2007Äê6ÔÂ25ÈÕ 15:13
ÊÕ¼þÈË: Chaucer Cao
Ö÷Ìâ: RE: [torqueusers] how to get Environment Variables

 

Hi Chaucer,

 

Could you provide  the part of your script, which is reading PBS env
variables?

 

Regards,

Roland

 


  _____  


From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Chaucer Cao
Sent: Wednesday, June 20, 2007 7:16 PM
To: torqueusers at supercluster.org
Subject: [torqueusers] how to get Environment Variables

Hi all,

Does any one know how can I get the the PBS environment variables in the run
script file. When I qsub my script file it gives:

PBS_NODEFILE: Undefined variable.

PBS_ENVIRONMENT: Undefined variable.

Many thanks!

Chaucer

_______________________________________________

torqueusers mailing list

torqueusers at supercluster.org

http://www.supercluster.org/mailman/listinfo/torqueusers

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070627/ffc7a378/attachment-0001.html


More information about the torqueusers mailing list