[torqueusers] Trying to get gpu support enabled with Torque 2.5.9

Jagga Soorma jagga13 at gmail.com
Tue Oct 1 12:45:08 MDT 2013

Hi Guys,

I have a need to enable gpu support on my existing cluster and I have spun
up a new test environment with the same Torque 2.5.9 version and configured
it the following way:

On the server (does not have any gpus):
./configure --enable-nvidia-gpus --with-debug --with-nvidia-gpus
make install

update the config files and started pbs_sched & pbs_server

On the client (this has 3 GPU's - Tesla M2050s)
./configure -with-debug --enable-nvidia-gpus
make rpm

then installed the torque and torque-client rpm.  Pointed this client to
the server and started the pbs_mom daemon.

On the server this client now shows up as connected and free for use and I
can submit a simple interactive job.

However, I was expecting the pbsnodes command to give me status on the
GPU's attached to my clients, but all I see is:

     state = free
     np = 16
     ntype = cluster
     status =
15201,sessions=? 15201,uname=Linux amber12 #1 SMP
2012-01-27 17:38:56 +0100 x86_64,opsys=linux
     gpus = 3

Also, if I try to submit a job requesting a gpu I get the following error:

qsub -I -l nodes=1:ppn=1:gpus=2

PBS_Server: LOG_ERROR::Undefined attribute  (15002) in send_job, child
failed in previous commit request for job 7173.xx

How can I get GPU support enabled?  Am I missing something here.  Also,
what I am trying to achieve is to allow torque to better spread jobs across
the 3 different GPU's.  Looks like in our current environment it loads up
the first GPU and never tries to balance the jobs across the other 2
available GPU's.

Any help would be appreciated.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131001/0da0a5f3/attachment.html 

More information about the torqueusers mailing list