[torqueusers] configuring node to use subset of available physical GPUs

Lev Givon lev at columbia.edu
Mon Mar 31 17:26:12 MDT 2014


Received from David Beer on Mon, Mar 31, 2014 at 05:01:07PM EDT:
> On Mon, Mar 31, 2014 at 2:42 PM, Lev Givon <lev at columbia.edu> wrote:

(snip)

> Is there any way to prevent torque from ever touching a specific GPU (or GPUs)
> on a system? The motivation for the question is to set aside those GPUs for
> non-torque-related use by potentially more than one simultaneous user and have
> torque use the remaining GPUs exclusively for submitted jobs.
>
> I believe nividia offers values you can set to prevent users from being
> able to access them.

Right - I was planning to set users' default CUDA_VISIBLE_DEVICES to 0 so that
they can only access that GPU for non-torque-managed purposes and let torque
manage the remaining GPUs on the system.

> I know that TORQUE has a feature coming in 4.2.8 to set an environment
> variable (CUDA_VISIBLE_DEVICES) for gpu jobs. 

Is this in 4.5.0? Is there any documentation somewhere for it?

> This makes the job only see the gpus with the index that you set. This is
> coming, but it isn't available yet.

I'm actually already doing something like that via the submit filter I alluded
to earlier [1]; it uses the values in $PBS_GPUFILE to set CUDA_VISIBLE_DEVICES.
However, if pbs_sched will potentially allocate any of the 8 GPUs in the
system to a job (via $PBS_GPUFILE) even if I only specify 6 in
server_priv/nodes, it effectively can't permanently exclude specific GPUs from
being allocated to torque jobs. Or is there some additional functionality in the
post-4.2.7 pbs_sched that does make this possible?

[1] https://gist.github.com/lebedov/9728629
-- 
Lev Givon
Bionet Group
http://www.columbia.edu/~lev/
http://lebedov.github.io/



More information about the torqueusers mailing list