[torqueusers] Performance of non-GPU codes on GPU nodes reduced by nvidia-smi overhead
akohlmey at cmm.chem.upenn.edu
Wed Feb 15 16:22:00 MST 2012
On Wed, Feb 15, 2012 at 6:15 PM, Doug Johnson <djohnson at osc.edu> wrote:
> Has anyone noticed the overhead when enabling GPU support in torque?
> The nvidia-smi process requires about 4 cpu seconds for each
> invocation. When executing a non-GPU code that uses all the cores
> this results in a bit of oversubscription of the cores. Since
> nvidia-smi is executed every 30 seconds to collect card state this
> results in a measurable decrease in performance.
> As a workaround I've enabled 'persistence mode' for the card. When
> not in use, the card is apparently not initialized. With persistence
> mode enabled the cpu time to execute the command is reduced to ~0.02.
> This will also help with the execution time of short kernels, as the
> card will be ready to go.
this doesn't affect applications usually, since they will open
a GPU "context" and keep it open until the application ends.
applications that require less time as total runtime are pointless
to run on the GPU.
> Do other people run with persistence mode enabled? Are there any
yes. this is the way to go. before nvidia was allowing that,
i had an nvidia-smi process doing a (very infrequent) log
in the background. this carries over other stuff, too. check out:
it doesn't happen on desktops due to running the X server,
which also holds a GPU context.
> PS. I think if X were running this would not be an issue.
> torqueusers mailing list
> torqueusers at supercluster.org
Dr. Axel Kohlmeyer akohlmey at gmail.com
Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.
More information about the torqueusers