[torqueusers] problem about running jobs on multiple nodes
Chien-Pin Chou
sol.chou at gmail.com
Sun Nov 11 19:30:25 MST 2007
Hello:
I have a problem about running jobs on multiple nodes (n>1)
when I use qsub -l nodes=2:ppn=2 for testing,
but it just select 2 cpus in one node instead of choosing 2 cpus per node,
which is total 4 cpus to run
my test script is :
#=========================
cd $PBS_O_WORKDIR
NPROCS=`wc -l < $PBS_NODEFILE`
echo $NPROCS
cat $PBS_NODEFILE
echo "...."
/opt/openmpi/bin/mpirun -np $NPROCS -machinefile $PBS_NODEFILE hostname
#=========================
if this works properly, it should produce the output like this:
#==============
4
venus
venus
node2
node2
...
venus
venus
node2
node2
#==========
however, the output is
#========
2
venus
venus
....
venus
venus
#=========
and my pbsnodes output is
sol at venus:~$ pbsnodes
venus
state = free
np = 7
ntype = cluster
status = opsys=linux,uname=Linux venus 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007 x86_64,sessions=3588
27920,nsessions=2,nusers=1,idletime=390,totmem=48636244kb,availmem=48281432kb,physmem=33013072kb,ncpus=8,loadave=
0.00,netload=426820530,state=free,jobs=,varattr=,rectime=1194834426
node1
state = down
np = 8
ntype = cluster
status = opsys=linux,uname=Linux node1 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007
x86_64,sessions=4486,nsessions=1,nusers=1,idletime=281,totmem=49013744kb,availmem=31803536kb,physmem=33013044kb,ncpus=8,loadave=
5.79,netload=113284598,state=free,jobs=,varattr=,rectime=1194629635
node2
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux node2 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007 x86_64,sessions=? 15201,nsessions=?
15201,nusers=0,idletime=1208,totmem=49013744kb,availmem=48817340kb,physmem=33013044kb,ncpus=8,loadave=
0.00,netload=60742509,state=free,jobs=,varattr=,rectime=1194834438
node3
state = free
np = 8
ntype = cluster
status = opsys=linux,uname=Linux node3 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007
x86_64,sessions=4993,nsessions=1,nusers=1,idletime=387,totmem=33013044kb,availmem=32828948kb,physmem=33013044kb,ncpus=8,loadave=
0.00,netload=57023444,state=free,jobs=,varattr=,rectime=1194834401
venus and node2 and node3 are free..
How can I solve this problem ?
Thanks for help...
Chien-Pin Chou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20071112/2e37c96f/attachment.html
More information about the torqueusers
mailing list