[torqueusers] configuring node to use subset of available physical GPUs

David Beer dbeer at adaptivecomputing.com
Mon Mar 31 15:01:07 MDT 2014


I believe nividia offers values you can set to prevent users from being
able to access them.

I know that TORQUE has a feature coming in 4.2.8 to set an environment
variable (CUDA_VISIBLE_DEVICES) for gpu jobs. This makes the job only see
the gpus with the index that you set. This is coming, but it isn't
available yet.


On Mon, Mar 31, 2014 at 2:42 PM, Lev Givon <lev at columbia.edu> wrote:

> Received from David Beer on Mon, Mar 31, 2014 at 04:34:25PM EDT:
> > On Mon, Mar 31, 2014 at 12:47 PM, Lev Givon <lev at columbia.edu> wrote:
> >
> > > If I configure a compute node (in its server_priv/nodes file) to use X
> > > number of GPUs where X < N and N = total number of physical GPUs in the
> > > system, are the first X physical GPUs in the system always the ones
> that are
> > > allocated to jobs that require GPUs? In other words, does the above
> > > configuration guarantee that torque will never allocate the remaining
> N-X
> > > remaining GPUs to jobs?
> > >
> > > I'm using torque 4.5.0pre1 on Ubuntu 13.10 with the built-in scheduler.
> >
> > Let's say you have 4 gpus but only want 2 to be used for jobs:
> >
> > 1. Make sure you aren't allowing it to auto-detect gpus. (This happens
> when
> > you configure the moms to report on each gpu, then -nvidia configure
> > options).
> > 2. In the nodes file, add gpus=2 to the line with the node.
> >
> > This doesn't guarantee that a job is unable to access the other gpus on
> the
> > system, but it guarantees that TORQUE will only tell the scheduler about
> 2
> > gpus, so more than 2 should never be scheduled at a time.
>
> Is there any way to prevent torque from ever touching a specific GPU (or
> GPUs)
> on a system? The motivation for the question is to set aside those GPUs for
> non-torque-related use by potentially more than one simultaneous user and
> have
> torque use the remaining GPUs exclusively for submitted jobs.
> --
> Lev Givon
> Bionet Group
> http://www.columbia.edu/~lev/
> http://lebedov.github.io/
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Senior Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140331/e9ee853b/attachment.html 


More information about the torqueusers mailing list