[torquedev] [Bug 95] New: Support for GPUs
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Thu Nov 4 07:08:27 MDT 2010
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=95
Summary: Support for GPUs
Product: TORQUE
Version: 3.0.0-alpha
Platform: PC
OS/Version: Linux
Status: NEW
Severity: enhancement
Priority: P5
Component: pbs_server
AssignedTo: glen.beane at gmail.com
ReportedBy: SimonT at mail.muni.cz
CC: torquedev at supercluster.org
Estimated Hours: 0.0
It seems that all that is needed for exclusive GPU access is changing the
ownership of the graphic card device (for nvidia: /dev/nvidiaX).
http://stackoverflow.com/questions/4077790/limiting-access-to-resources-for-cuda-and-opencl
If this is true, then we can support the homogeneous use case of GPUs very
simply (different cards in one machine require much more server and node
logic).
Counted resources are supported by Bug 67, that ensures correct assignment of
jobs requesting GPUs.
As for the node part, I see two possible approaches:
1) modifying the linux mom_mach.c file (presumably in mom_set_limits) to
correctly find and chown the corresponding GPUs. I'm not sure about the cleanup
part (maybe kill_task).
2) doing the GPU assignment/cleanup in the prologue/epilogue. Node code only
sets environment variable GPU_COUNT, GPU_LIST (or similar).
The second one might be preferred because it would allow users of Torque to
easily write/modify their own implementations of the GPU assignment. Therefore
making it easy to port it for a different GPU API.
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list