[torqueusers] Trying to get gpu support enabled with Torque 2.5.9

Jagga Soorma jagga13 at gmail.com
Tue Oct 1 12:45:08 MDT 2013


Hi Guys,

I have a need to enable gpu support on my existing cluster and I have spun
up a new test environment with the same Torque 2.5.9 version and configured
it the following way:

On the server (does not have any gpus):
./configure --enable-nvidia-gpus --with-debug --with-nvidia-gpus
make
make install

update the config files and started pbs_sched & pbs_server

On the client (this has 3 GPU's - Tesla M2050s)
./configure -with-debug --enable-nvidia-gpus
--with-nvml-lib=/var/tmp/Tesla_Deployment_Kit/tdk_3.304.5/nvml/lib64
--with-nvml-include=/v
ar/tmp/Tesla_Deployment_Kit/tdk_3.304.5/nvml/include
make
make rpm

then installed the torque and torque-client rpm.  Pointed this client to
the server and started the pbs_mom daemon.

On the server this client now shows up as connected and free for use and I
can submit a simple interactive job.

However, I was expecting the pbsnodes command to give me status on the
GPU's attached to my clients, but all I see is:

--
node1
     state = free
     np = 16
     ntype = cluster
     status =
rectime=1380652415,varattr=,jobs=,state=free,netload=674176243914,gres=,loadave=0.01,ncpus=16,physmem=24730388kb,availmem=48833164kb,totmem=49904200kb,idletime=852,nusers=0,nsessions=?
15201,sessions=? 15201,uname=Linux amber12 2.6.32.54-0.3-default #1 SMP
2012-01-27 17:38:56 +0100 x86_64,opsys=linux
     gpus = 3
--

Also, if I try to submit a job requesting a gpu I get the following error:

qsub -I -l nodes=1:ppn=1:gpus=2

--
PBS_Server: LOG_ERROR::Undefined attribute  (15002) in send_job, child
failed in previous commit request for job 7173.xx
--

How can I get GPU support enabled?  Am I missing something here.  Also,
what I am trying to achieve is to allow torque to better spread jobs across
the 3 different GPU's.  Looks like in our current environment it loads up
the first GPU and never tries to balance the jobs across the other 2
available GPU's.

Any help would be appreciated.

Thanks,
-J
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20131001/0da0a5f3/attachment.html 


More information about the torqueusers mailing list