[torqueusers] Running either GPGPU or GL GPU jobs on nodes

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Fri Nov 11 10:41:56 MST 2011


On Fri, Nov 11, 2011 at 12:07 PM, Ti Leggett <leggett at mcs.anl.gov> wrote:
> Can you run CUDA gdb while X is running. I have a user trying to do this and this is the error they're getting:
>
> "error: All CUDA devices are used for X11 and cannot be used while debugging."

no. i never even tried using cuda-gdb, but from what i know
about how it is supposed to work, this is likely a case where
you have to have a long-lived kernel and then it will collide
with using X at the same time. the error message confirms that.

question is, does a single person needing to do some debugging
require a queue reconfiguration? unless this happens on a regular
basis. i would just set up a reservation for this user and then
turn off X on that/those node(s) for the time being.

axel.


>
> On Nov 11, 2011, at 10:46 AM, Gustavo Correa wrote:
>
>>
>> On Nov 11, 2011, at 11:18 AM, Axel Kohlmeyer wrote:
>>
>>> On Fri, Nov 11, 2011 at 10:35 AM, Ti Leggett <leggett at mcs.anl.gov> wrote:
>>>> We have NV GPUs and we have some users who want to run GPGPU jobs (like CUDA) and we have other users who want to run GL GPU jobs. GL jobs require the machine to have X started (runlevel 5) and GPGPU jobs can't run when X is running. Does anyone have a good method of letting users specify which type of GPU job they need to run and changing the runlevel appropriately?
>>>
>>> with nvidia hardware GPGPU jobs _can_ run when X
>>> is running. i am doing that on my desktop all the time.
>>> you may need to tweak the timeout that is set to
>>> keep GPGPU applications from hogging the GPU
>>> when X is running, if your GPGPU users write kernels
>>> that run excessively long. in most cases, that is
>>> just bad program design.
>>>
>>> axel.
>>>
>>
>> Hi Ti
>>
>> I guess you don't want to let users change the machine runlevel.
>> However, I presume you could check the if the job requires X and change the runlevel
>> in a prologue script,
>> then return to runlevel 3 in an epilogue script at the end of the job.
>>
>> I suppose you could identify the GL_GPU jobs if you associate them
>> to a node property, e.g. it could be named GL_GPU and added to the appropriate nodes in the server_priv/nodes file.
>> Then the user would request  nodes with the 'GL_GPU' property on her/his Torque qsub
>> script/command line, which your preamble script could then deal with by changing runlevel
>> to 5.
>>
>> Just a suggestion.
>>
>> Gus Correa
>>
>>>
>>>> -----BEGIN PGP SIGNATURE-----
>>>>
>>>> iEYEARECAAYFAk69QNEACgkQ4RgdOxQVi0DwCQCfSGsUD+/h2wfhPUeuI9k8i8lf
>>>> ScIAnAp3crBjAdQ/keek1ZuEKqbidqSq
>>>> =BmBW
>>>> -----END PGP SIGNATURE-----
>>>>
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Dr. Axel Kohlmeyer    akohlmey at gmail.com
>>> http://sites.google.com/site/akohlmey/
>>>
>>> Institute for Computational Molecular Science
>>> Temple University, Philadelphia PA, USA.
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> -----BEGIN PGP SIGNATURE-----
>
> iEYEARECAAYFAk69VmsACgkQ4RgdOxQVi0AAgACdGOJSDr2lTjYc446hHdDvoxW+
> Ik4An2ZJFEAtY9jTHVvJe1dkDuoUQwHt
> =i2x0
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>



-- 
Dr. Axel Kohlmeyer    akohlmey at gmail.com
http://sites.google.com/site/akohlmey/

Institute for Computational Molecular Science
Temple University, Philadelphia PA, USA.


More information about the torqueusers mailing list