=?gb2312?B?tPC4tDogW3RvcnF1ZXVzZXJzXSBzdGFydCBpbnRlbCBtcGkgaW4gcGJz?=
Chaucer Cao
ccao at sgi.com
Wed Jun 27 04:09:34 MDT 2007
Hi Donald,
How to enable the admin node as a computer node in torque? Many thanks!
Best wishes,
Chaucer
_____
·¢¼þÈË: Donald Tripp [mailto:dtripp at hawaii.edu]
·¢ËÍʱ¼ä: 2007Äê6ÔÂ27ÈÕ 16:51
ÊÕ¼þÈË: Chaucer Cao
³ËÍ: torqueusers at supercluster.org; 'Krause, Roland'
Ö÷Ìâ: Re: [torqueusers] start intel mpi in pbs
It looks like torque is configured not to run jobs on the admin / main node
(in this case, "cluster"). Thats why you get only 3 hosts available,
because it will launch mpd on the c0-0, c0-1, and c0-2 nodes, but not
cluster.
By default, torque is setup not to allow jobs to run on the admin / main
node, but this can be enabled, and will have to be in your case.
- Donald Tripp
<mailto:dtripp at hawaii.edu> dtripp at hawaii.edu
----------------------------------------------
HPC Systems Administrator
High Performance Computing Center
University of Hawai'i at Hilo
200 W. Kawili Street
Hilo, Hawaii 96720
<http://www.hpc.uhh.hawaii.edu/> http://www.hpc.uhh.hawaii.edu
On Jun 26, 2007, at 10:38 PM, Chaucer Cao wrote:
Hi all,
In the pbs script file I can¡¯t start the mpd (intel mpi ) useing the
following command
****************************************************************************
mpdboot --rsh=ssh -v -n `cat mpd.hosts|wc -l` -f mpd.hosts
****************************************************************************
It gives:
----------------------------------------------------------------------------
----------------------
totalnum=4 numhosts=3
there are not enough hosts on which to start all processes
----------------------------------------------------------------------------
----------------------
But I can manually start mpd using the same command.
----------------------------------------------------------------------------
---------------------
[mpp at cluster std]$ mpdboot --rsh=ssh -v -n 4 -f mpd.hosts
running mpdallexit on cluster
LAUNCHED mpd on cluster via
RUNNING: mpd on cluster
LAUNCHED mpd on c0-0 via cluster
LAUNCHED mpd on c0-1 via cluster
LAUNCHED mpd on c0-2 via cluster
RUNNING: mpd on c0-0
RUNNING: mpd on c0-1
RUNNING: mpd on c0-2
----------------------------------------------------------------------------
---------------------
Does any one know how to fix? Many thanks!
Best wishes,
Chaucer
_____
·¢¼þÈË: Chaucer Cao [mailto:ccao at sgi.com]
·¢ËÍʱ¼ä: 2007Äê6ÔÂ26ÈÕ 14:12
ÊÕ¼þÈË: 'Krause, Roland'
Ö÷Ìâ: ´ð¸´: [torqueusers] how to get Environment Variables
Hi Roland,
Maybe the pbsnodes give the ntype cluster info. You :
c0-2
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux compute-0-2.local 2.6.9-42.0.2.ELsmp
#1 SMP Wed Aug 23 13:38:27 BST 2006
x86_64,sessions=14316,nsessions=1,nusers=1,idletime=105210,totmem=5045676kb,
availmem=4608468kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=4831003983
28,state=free,jobs=,varattr=,rectime=1182836318
c0-1
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux compute-0-1.local 2.6.9-42.0.2.ELsmp
#1 SMP Wed Aug 23 13:38:27 BST 2006
x86_64,sessions=26709,nsessions=1,nusers=1,idletime=234995,totmem=5045672kb,
availmem=4592532kb,physmem=4025556kb,ncpus=4,loadave=4.00,netload=6979530682
35,state=free,jobs=,varattr=,rectime=1182836316
c0-0
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux compute-0-0.local 2.6.9-42.0.2.ELsmp
#1 SMP Wed Aug 23 13:38:27 BST 2006
x86_64,sessions=28348,nsessions=1,nusers=1,idletime=220618,totmem=5045676kb,
availmem=4557852kb,physmem=4025560kb,ncpus=4,loadave=4.00,netload=5880689455
21,state=free,jobs=,varattr=,rectime=1182836318
cluster
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux cluster.hpc.org 2.6.9-42.0.2.ELsmp #1
SMP Wed Aug 23 13:38:27 BST 2006 x86_64,sessions=2993 24894 25052 25158
25307,nsessions=5,nusers=3,idletime=92734,totmem=5045676kb,availmem=4130016k
b,physmem=4025560kb,ncpus=4,loadave=4.48,netload=678702222035,state=free,job
s=,varattr=,rectime=1182836315
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----------------
It seems the head node get the different domain. In the /etc/hosts
#
# Do NOT Edit (generated by dbreport)
#
127.0.0.1 localhost.localdomain localhost
10.1.1.1 cluster.local cluster # originally frontend-0-0
10.255.255.254 compute-0-0.local compute-0-0 c0-0
10.255.255.253 compute-0-1.local compute-0-1 c0-1
10.255.255.252 compute-0-2.local compute-0-2 c0-2
192.168.1.1 cluster.hpc.org
But I don¡¯t how tell the pbs_server he should use the cluster.local. :-)
thanks!
Best wishes,
Chaucer
_____
·¢¼þÈË: Krause, Roland [mailto:Roland.Krause at amtc-dresden.com]
·¢ËÍʱ¼ä: 2007Äê6ÔÂ25ÈÕ 19:37
ÊÕ¼þÈË: Chaucer Cao
Ö÷Ìâ: RE: [torqueusers] how to get Environment Variables
Hi Chaucer,
beside our production system we have a test system with two nodes. One of
them is server,
but I can run jobs with qsub -l nodes=2.
Do all your nodes have the "ntype" "cluster"?
Regards,
Roland
_____
From: Chaucer Cao [mailto:ccao at sgi.com]
Sent: Monday, June 25, 2007 10:33 AM
To: Krause, Roland
Subject: ??: [torqueusers] how to get Environment Variables
Hi Roland,
The Environment variables problem is OK now. but I encounter another
problem:
There are four nodes including the head node. But I only can submit 3-node
job by qsub. When I submit a 4-node job it gives:
c0-0
c0-1
c0-2
cluster
totalnum=4 numhosts=3
there are not enough hosts on which to start all processes
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
mpdtrace: cannot connect to local mpd (/tmp/mpd2.console_ccao); possible
causes:
mpdallexit: cannot connect to local mpd (/tmp/mpd2.console_ccao); possible
causes:
1. no mpd is running on this host
2. an mpd is running but was started without a "console" (-n option)
It seems I can¡¯t run the job on head node(cluster) with pbs. But I can run
4-node job directly (without qsub).
When I use pbsnodes to check it seems all nodes are in free status. Can you
help me on this? Many thanks!
Best wishes,
Chaucer
_____
·¢¼þÈË: Krause, Roland [mailto:Roland.Krause at amtc-dresden.com]
·¢ËÍʱ¼ä: 2007Äê6ÔÂ25ÈÕ 15:13
ÊÕ¼þÈË: Chaucer Cao
Ö÷Ìâ: RE: [torqueusers] how to get Environment Variables
Hi Chaucer,
Could you provide the part of your script, which is reading PBS env
variables?
Regards,
Roland
_____
From: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] On Behalf Of Chaucer Cao
Sent: Wednesday, June 20, 2007 7:16 PM
To: torqueusers at supercluster.org
Subject: [torqueusers] how to get Environment Variables
Hi all,
Does any one know how can I get the the PBS environment variables in the run
script file. When I qsub my script file it gives:
PBS_NODEFILE: Undefined variable.
PBS_ENVIRONMENT: Undefined variable.
Many thanks!
Chaucer
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070627/ffc7a378/attachment-0001.html
More information about the torqueusers
mailing list