[torqueusers] Performance of non-GPU codes on GPU nodes reduced by nvidia-smi overhead
djohnson at osc.edu
Wed Feb 15 16:15:08 MST 2012
Has anyone noticed the overhead when enabling GPU support in torque?
The nvidia-smi process requires about 4 cpu seconds for each
invocation. When executing a non-GPU code that uses all the cores
this results in a bit of oversubscription of the cores. Since
nvidia-smi is executed every 30 seconds to collect card state this
results in a measurable decrease in performance.
As a workaround I've enabled 'persistence mode' for the card. When
not in use, the card is apparently not initialized. With persistence
mode enabled the cpu time to execute the command is reduced to ~0.02.
This will also help with the execution time of short kernels, as the
card will be ready to go.
Do other people run with persistence mode enabled? Are there any
PS. I think if X were running this would not be an issue.
More information about the torqueusers