[torquedev] [Bug 95] Support for GPUs

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Nov 18 10:00:41 MST 2010


--- Comment #22 from Simon Toth <SimonT at mail.muni.cz> 2010-11-18 10:00:41 MST ---
(In reply to comment #21)
> > 
> > Looking at the code in 2.5-fixes, how will the program actually know which
> > cards are allocated? Will the ids match the devices?
> Our first pass idea is to do what TORQUE does with cores (ppn, virtual
> processors, however they should be referred to). An admin is allowed to
> overload their cores if desired - they can set a 4 core machine to ppn=8 or
> anything they like. There is also no guarantee that, if they are assigned
> host/0 (theoretically the 0th core) that the job will actually execute on the
> 0th core.
> We can imagine that some site is going to want to overload their gpus, just as
> some sites do with cpus, and so our initial approach is to handle gpus exactly
> the same way cores are handled by default. It is up to the user to guarantee
> that they actually execute on the GPU(s) assigned to their job, by reading the
> file $PBS_GPUFILE. Eventually, we will add options to lock GPUs to their jobs
> (like cpusets) and to autodetect the number and types of GPUs on each system.
> This is something we will eventually do but not something TORQUE can handle at
> this point.

This is actually a different issue. With GPU APIs what you need to do is
specify a card upon initialization, therefore the job kind of needs to know
which gpus are allocated. What I'm asking is the mapping. Because if there
isn't, how will the user know what cards are OK to use?

Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the torquedev mailing list