[torqueusers] gpu mode shared not working with torque

Eva Hocks hocks at sdsc.edu
Mon Sep 30 18:32:51 MDT 2013



I ran some tests using the gpu feature of torque (torque 4.2.5 with
--enable-nvidia-gpus)

The prolbem is that torque allows the request gpus=1:shared but does NOT
reset the allocated gpu to the "default' mode. If the gpu is in
"default" the gpus=1:shared works just fine, the problem is the reset
after another job used the exclusive mode.


Any help out there?????

We are running with driver_ver=325.15

Thanks
Eva

GPU mode after node start:
---------------------------
gpu-2-16
Mon Sep 30 16:06:35 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.325.15   Driver Version: 325.15         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TITAN   Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   32C  N/A     N/A /  N/A |       14MB /  6143MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+


Test with torque default mode:
-------------------------------
qsub -I -l nodes=gpu-2-16:ppn=1:gpus=1  -q active


$ cat $PBS_GPUFILE
gpu-2-16-gpu0


[hocks at gpu-2-16 ~]$ nvidia-smi
Mon Sep 30 16:08:42 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.325.15   Driver Version: 325.15         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TITAN   Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   32C  N/A     N/A /  N/A |       14MB /  6143MB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+


Test with gpu default mode
----------------------------
Now requesting shared mode for the gpu does not seem to reset the
device mode:

$ qsub -I -l nodes=gpu-2-16:ppn=1:gpus=1:shared  -q active

[hocks at gpu-2-16 ~]$ cat $PBS_GPUFILE
gpu-2-16-gpu0
[hocks at gpu-2-16 ~]$ nvidia-smi
Mon Sep 30 16:21:07 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.325.15   Driver Version: 325.15         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TITAN   Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   32C  N/A     N/A /  N/A |       14MB /  6143MB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+


Test with exclusive_process
--------------------------------
requesting exclusive_process does reset the previous mode

qsub -I -l nodes=gpu-2-16:ppn=1:gpus=1:exclusive_process -q active

[hocks at gpu-2-16 ~]$ cat $PBS_GPUFILE
gpu-2-16-gpu0
[hocks at gpu-2-16 ~]$ nvidia-smi
Mon Sep 30 16:35:01 2013
+------------------------------------------------------+
| NVIDIA-SMI 5.325.15   Driver Version: 325.15         |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TITAN   Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   33C  N/A     N/A /  N/A |       14MB /  6143MB |     N/A   E. Process |









More information about the torqueusers mailing list