[torquedev] [Bug 95] Support for GPUs

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Nov 4 10:40:09 MDT 2010


http://www.clusterresources.com/bugzilla/show_bug.cgi?id=95

Ken Nielson <knielson at adaptivecomputing.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |knielson at adaptivecomputing.
                   |                            |com

--- Comment #5 from Ken Nielson <knielson at adaptivecomputing.com> 2010-11-04 10:40:09 MDT ---
>Counted resources are supported by Bug 67, that ensures correct assignment of
>jobs requesting GPUs.

By elevating the GPU to the same level as ppn the GPU is now a counted
resource. Moreover, we can now create a node spec that can specifiy how many
processors and GPUs are needed for a job. For example:

qsub -l nodes=hostA:ppn=2:gpu=1 <job.sh>

This will allocate two np and one gpu on hostA. We can do multiple node
assignments as well.

qsub -l nodes=2:ppn=2:gpu=1+2:ppn=2:gpu=2,mem=4Gb <job.sh>

We have now requested two nodes with two np each and 1 gpu each plus 2 more
nodes with two np and two gpu each.

The configuration and syntax fit easily in the current TORQUE build. It is also
generic as to what a gpu is. 

Later we can add the syntax to qsub to support exclusive access and other
features of gpus. We could also add an auto-detect feature that would populate
each host with the number of gpus available plus report statistics in pbsnodes
for the gpus.

Another advantage of this syntax is that it can fit easily into the existing tm
interface. MPI would not need to make many changes if any at all to manage gpus
on multiple MOMs.

-- 
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.


More information about the torquedev mailing list