[torqueusers] requesting gpus

Gareth.Williams at csiro.au Gareth.Williams at csiro.au
Thu Feb 2 20:17:36 MST 2012


Hi All,

I added a basic gpus count information to one of our compute nodes with:
qmgr -c 's n n121 gpus = 2'
and it seems fine:
> pbsnodes -a n121
n121
     state = free
     np = 12
     ntype = cluster
     status = rectime=1328238593,varattr=,jobs=,state=free,size=133709780kb:144492840kb,netload=156768229618,gres=,loadave=2.00,ncpus=24,physmem=99195396kb,availmem=95103784kb,totmem=101299868kb,idletime=173222,nusers=0,nsessions=0,uname=Linux n121 2.6.32.49-0.3-default #1 SMP 2011-12-02 11:28:04 +0100 x86_64,opsys=sles11,arch=x86_64
     mom_service_port = 15002
     mom_manager_port = 15003
     gpus = 2

However when I run a job with the recommended syntax:
http://www.adaptivecomputing.com/resources/docs/torque/3-0-3/3.7schedulinggpus.php
I get:
> qsub -I -q viz -l nodes=1:ppn=1:gpus=1
qsub: Job exceeds queue resource limits MSG=cannot locate feasible nodes

The torque version is 3.0.3-snap.201108261653

Note that this is _not_ the --enable-nvidia-gpus functionality.
Also note that the server has not been restarted.
The scheduler is moab but I'm pretty sure the job gets rejected well before moab comes into the picture.

Does anyone have such a setup working or can anyone see what is wrong (or have an idea of where to look)?

Regards,

Gareth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120203/b5961f5e/attachment.html 


More information about the torqueusers mailing list