[torqueusers] configuring node to use subset of available physical GPUs
dbeer at adaptivecomputing.com
Mon Mar 31 15:01:07 MDT 2014
I believe nividia offers values you can set to prevent users from being
able to access them.
I know that TORQUE has a feature coming in 4.2.8 to set an environment
variable (CUDA_VISIBLE_DEVICES) for gpu jobs. This makes the job only see
the gpus with the index that you set. This is coming, but it isn't
On Mon, Mar 31, 2014 at 2:42 PM, Lev Givon <lev at columbia.edu> wrote:
> Received from David Beer on Mon, Mar 31, 2014 at 04:34:25PM EDT:
> > On Mon, Mar 31, 2014 at 12:47 PM, Lev Givon <lev at columbia.edu> wrote:
> > > If I configure a compute node (in its server_priv/nodes file) to use X
> > > number of GPUs where X < N and N = total number of physical GPUs in the
> > > system, are the first X physical GPUs in the system always the ones
> that are
> > > allocated to jobs that require GPUs? In other words, does the above
> > > configuration guarantee that torque will never allocate the remaining
> > > remaining GPUs to jobs?
> > >
> > > I'm using torque 4.5.0pre1 on Ubuntu 13.10 with the built-in scheduler.
> > Let's say you have 4 gpus but only want 2 to be used for jobs:
> > 1. Make sure you aren't allowing it to auto-detect gpus. (This happens
> > you configure the moms to report on each gpu, then -nvidia configure
> > options).
> > 2. In the nodes file, add gpus=2 to the line with the node.
> > This doesn't guarantee that a job is unable to access the other gpus on
> > system, but it guarantees that TORQUE will only tell the scheduler about
> > gpus, so more than 2 should never be scheduled at a time.
> Is there any way to prevent torque from ever touching a specific GPU (or
> on a system? The motivation for the question is to set aside those GPUs for
> non-torque-related use by potentially more than one simultaneous user and
> torque use the remaining GPUs exclusively for submitted jobs.
> Lev Givon
> Bionet Group
> torqueusers mailing list
> torqueusers at supercluster.org
David Beer | Senior Software Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers