[torqueusers] problem about running jobs on multiple nodes

Chien-Pin Chou sol.chou at gmail.com
Sun Nov 11 19:30:25 MST 2007


Hello:

I have a problem about running jobs on multiple nodes (n>1)

when I use qsub -l nodes=2:ppn=2 for testing,
but it just select 2 cpus in one node instead of choosing 2 cpus per node,
which is total 4 cpus to run

my test script is :
#=========================
cd $PBS_O_WORKDIR
NPROCS=`wc -l < $PBS_NODEFILE`
echo $NPROCS
cat $PBS_NODEFILE
echo "...."
/opt/openmpi/bin/mpirun -np $NPROCS -machinefile $PBS_NODEFILE hostname
#=========================
if this works properly, it should produce the output like this:
#==============
4
venus
venus
node2
node2
...
venus
venus
node2
node2
#==========

however, the output is
#========
2
venus
venus
....
venus
venus
#=========

and my pbsnodes output is

sol at venus:~$ pbsnodes
venus
     state = free
     np = 7
     ntype = cluster
     status = opsys=linux,uname=Linux venus 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007 x86_64,sessions=3588
27920,nsessions=2,nusers=1,idletime=390,totmem=48636244kb,availmem=48281432kb,physmem=33013072kb,ncpus=8,loadave=
0.00,netload=426820530,state=free,jobs=,varattr=,rectime=1194834426

node1
     state = down
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux node1 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007
x86_64,sessions=4486,nsessions=1,nusers=1,idletime=281,totmem=49013744kb,availmem=31803536kb,physmem=33013044kb,ncpus=8,loadave=
5.79,netload=113284598,state=free,jobs=,varattr=,rectime=1194629635

node2
     state = free
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux node2 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007 x86_64,sessions=? 15201,nsessions=?
15201,nusers=0,idletime=1208,totmem=49013744kb,availmem=48817340kb,physmem=33013044kb,ncpus=8,loadave=
0.00,netload=60742509,state=free,jobs=,varattr=,rectime=1194834438

node3
     state = free
     np = 8
     ntype = cluster
     status = opsys=linux,uname=Linux node3 2.6.22-2-amd64 #1 SMP Thu Aug 30
23:43:59 UTC 2007
x86_64,sessions=4993,nsessions=1,nusers=1,idletime=387,totmem=33013044kb,availmem=32828948kb,physmem=33013044kb,ncpus=8,loadave=
0.00,netload=57023444,state=free,jobs=,varattr=,rectime=1194834401

venus and node2 and node3 are free..

How can I solve this problem ?

Thanks for help...

Chien-Pin Chou
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20071112/2e37c96f/attachment.html


More information about the torqueusers mailing list