[torqueusers] Performance of non-GPU codes on GPU nodes reduced by nvidia-smi overhead
dbeer at adaptivecomputing.com
Wed Feb 15 16:22:09 MST 2012
Have you tried using the --with-nvml-include=<path> option in configure?
This has pbs_mom use the nvidia API for these calls, and should speed
things up a bit. The path should be the path to the nvml.h file and is
On Wed, Feb 15, 2012 at 4:15 PM, Doug Johnson <djohnson at osc.edu> wrote:
> Has anyone noticed the overhead when enabling GPU support in torque?
> The nvidia-smi process requires about 4 cpu seconds for each
> invocation. When executing a non-GPU code that uses all the cores
> this results in a bit of oversubscription of the cores. Since
> nvidia-smi is executed every 30 seconds to collect card state this
> results in a measurable decrease in performance.
> As a workaround I've enabled 'persistence mode' for the card. When
> not in use, the card is apparently not initialized. With persistence
> mode enabled the cpu time to execute the command is reduced to ~0.02.
> This will also help with the execution time of short kernels, as the
> card will be ready to go.
> Do other people run with persistence mode enabled? Are there any
> PS. I think if X were running this would not be an issue.
> torqueusers mailing list
> torqueusers at supercluster.org
David Beer | Software Engineer
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers