[torqueusers] Performance of non-GPU codes on GPU nodes reduced by nvidia-smi overhead

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Wed Feb 15 16:22:00 MST 2012

On Wed, Feb 15, 2012 at 6:15 PM, Doug Johnson <djohnson at osc.edu> wrote:
> Hi,
> Has anyone noticed the overhead when enabling GPU support in torque?
> The nvidia-smi process requires about 4 cpu seconds for each
> invocation.  When executing a non-GPU code that uses all the cores
> this results in a bit of oversubscription of the cores.  Since
> nvidia-smi is executed every 30 seconds to collect card state this
> results in a measurable decrease in performance.
> As a workaround I've enabled 'persistence mode' for the card.  When
> not in use, the card is apparently not initialized.  With persistence
> mode enabled the cpu time to execute the command is reduced to ~0.02.
> This will also help with the execution time of short kernels, as the
> card will be ready to go.

this doesn't affect applications usually, since they will open
a GPU "context" and keep it open until the application ends.
applications that require less time as total runtime are pointless
to run on the GPU.

> Do other people run with persistence mode enabled?  Are there any
> downsides?

yes. this is the way to go. before nvidia was allowing that,
i had an nvidia-smi process doing a (very infrequent) log
in the background. this carries over other stuff, too. check out:


it doesn't happen on desktops due to running the X server,
which also holds a GPU context.


> Doug
> PS. I think if X were running this would not be an issue.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

Dr. Axel Kohlmeyer    akohlmey at gmail.com

Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.

More information about the torqueusers mailing list