[torqueusers] Performance of non-GPU codes on GPU nodes reduced by nvidia-smi overhead

David Beer dbeer at adaptivecomputing.com
Wed Feb 15 16:22:09 MST 2012


Doug,

Have you tried using the --with-nvml-include=<path> option in configure?
This has pbs_mom use the nvidia API for these calls, and should speed
things up a bit. The path should be the path to the nvml.h file and is
usually:
/usr/local/cuda/CUDAToolsSDK/NVML/

David

On Wed, Feb 15, 2012 at 4:15 PM, Doug Johnson <djohnson at osc.edu> wrote:

> Hi,
>
> Has anyone noticed the overhead when enabling GPU support in torque?
> The nvidia-smi process requires about 4 cpu seconds for each
> invocation.  When executing a non-GPU code that uses all the cores
> this results in a bit of oversubscription of the cores.  Since
> nvidia-smi is executed every 30 seconds to collect card state this
> results in a measurable decrease in performance.
>
> As a workaround I've enabled 'persistence mode' for the card.  When
> not in use, the card is apparently not initialized.  With persistence
> mode enabled the cpu time to execute the command is reduced to ~0.02.
> This will also help with the execution time of short kernels, as the
> card will be ready to go.
>
> Do other people run with persistence mode enabled?  Are there any
> downsides?
>
> Doug
>
> PS. I think if X were running this would not be an issue.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
David Beer | Software Engineer
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120215/1b2f8ab0/attachment.html 


More information about the torqueusers mailing list