[torqueusers] configuring node to use subset of available physical GPUs
lev at columbia.edu
Mon Mar 31 17:26:12 MDT 2014
Received from David Beer on Mon, Mar 31, 2014 at 05:01:07PM EDT:
> On Mon, Mar 31, 2014 at 2:42 PM, Lev Givon <lev at columbia.edu> wrote:
> Is there any way to prevent torque from ever touching a specific GPU (or GPUs)
> on a system? The motivation for the question is to set aside those GPUs for
> non-torque-related use by potentially more than one simultaneous user and have
> torque use the remaining GPUs exclusively for submitted jobs.
> I believe nividia offers values you can set to prevent users from being
> able to access them.
Right - I was planning to set users' default CUDA_VISIBLE_DEVICES to 0 so that
they can only access that GPU for non-torque-managed purposes and let torque
manage the remaining GPUs on the system.
> I know that TORQUE has a feature coming in 4.2.8 to set an environment
> variable (CUDA_VISIBLE_DEVICES) for gpu jobs.
Is this in 4.5.0? Is there any documentation somewhere for it?
> This makes the job only see the gpus with the index that you set. This is
> coming, but it isn't available yet.
I'm actually already doing something like that via the submit filter I alluded
to earlier ; it uses the values in $PBS_GPUFILE to set CUDA_VISIBLE_DEVICES.
However, if pbs_sched will potentially allocate any of the 8 GPUs in the
system to a job (via $PBS_GPUFILE) even if I only specify 6 in
server_priv/nodes, it effectively can't permanently exclude specific GPUs from
being allocated to torque jobs. Or is there some additional functionality in the
post-4.2.7 pbs_sched that does make this possible?
More information about the torqueusers