[torqueusers] Running either GPGPU or GL GPU jobs on nodes

Ti Leggett leggett at mcs.anl.gov
Fri Nov 11 14:48:00 MST 2011


I like the simplicity of that idea :)
Thanks!

On Nov 11, 2011, at 11:41 AM, Axel Kohlmeyer wrote:

> On Fri, Nov 11, 2011 at 12:07 PM, Ti Leggett <leggett at mcs.anl.gov> wrote:
>> Can you run CUDA gdb while X is running. I have a user trying to do this and this is the error they're getting:
>> 
>> "error: All CUDA devices are used for X11 and cannot be used while debugging."
> 
> no. i never even tried using cuda-gdb, but from what i know
> about how it is supposed to work, this is likely a case where
> you have to have a long-lived kernel and then it will collide
> with using X at the same time. the error message confirms that.
> 
> question is, does a single person needing to do some debugging
> require a queue reconfiguration? unless this happens on a regular
> basis. i would just set up a reservation for this user and then
> turn off X on that/those node(s) for the time being.
> 
> axel.
> 
> 
>> 
>> On Nov 11, 2011, at 10:46 AM, Gustavo Correa wrote:
>> 
>>> 
>>> On Nov 11, 2011, at 11:18 AM, Axel Kohlmeyer wrote:
>>> 
>>>> On Fri, Nov 11, 2011 at 10:35 AM, Ti Leggett <leggett at mcs.anl.gov> wrote:
>>>>> We have NV GPUs and we have some users who want to run GPGPU jobs (like CUDA) and we have other users who want to run GL GPU jobs. GL jobs require the machine to have X started (runlevel 5) and GPGPU jobs can't run when X is running. Does anyone have a good method of letting users specify which type of GPU job they need to run and changing the runlevel appropriately?
>>>> 
>>>> with nvidia hardware GPGPU jobs _can_ run when X
>>>> is running. i am doing that on my desktop all the time.
>>>> you may need to tweak the timeout that is set to
>>>> keep GPGPU applications from hogging the GPU
>>>> when X is running, if your GPGPU users write kernels
>>>> that run excessively long. in most cases, that is
>>>> just bad program design.
>>>> 
>>>> axel.
>>>> 
>>> 
>>> Hi Ti
>>> 
>>> I guess you don't want to let users change the machine runlevel.
>>> However, I presume you could check the if the job requires X and change the runlevel
>>> in a prologue script,
>>> then return to runlevel 3 in an epilogue script at the end of the job.
>>> 
>>> I suppose you could identify the GL_GPU jobs if you associate them
>>> to a node property, e.g. it could be named GL_GPU and added to the appropriate nodes in the server_priv/nodes file.
>>> Then the user would request  nodes with the 'GL_GPU' property on her/his Torque qsub
>>> script/command line, which your preamble script could then deal with by changing runlevel
>>> to 5.
>>> 
>>> Just a suggestion.
>>> 
>>> Gus Correa
>>> 
>>>> 
>>>>> -----BEGIN PGP SIGNATURE-----
>>>>> 
>>>>> iEYEARECAAYFAk69QNEACgkQ4RgdOxQVi0DwCQCfSGsUD+/h2wfhPUeuI9k8i8lf
>>>>> ScIAnAp3crBjAdQ/keek1ZuEKqbidqSq
>>>>> =BmBW
>>>>> -----END PGP SIGNATURE-----
>>>>> 
>>>>> _______________________________________________
>>>>> torqueusers mailing list
>>>>> torqueusers at supercluster.org
>>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Dr. Axel Kohlmeyer    akohlmey at gmail.com
>>>> http://sites.google.com/site/akohlmey/
>>>> 
>>>> Institute for Computational Molecular Science
>>>> Temple University, Philadelphia PA, USA.
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>> 
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
>> 
>> -----BEGIN PGP SIGNATURE-----
>> 
>> iEYEARECAAYFAk69VmsACgkQ4RgdOxQVi0AAgACdGOJSDr2lTjYc446hHdDvoxW+
>> Ik4An2ZJFEAtY9jTHVvJe1dkDuoUQwHt
>> =i2x0
>> -----END PGP SIGNATURE-----
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>> 
>> 
> 
> 
> 
> -- 
> Dr. Axel Kohlmeyer    akohlmey at gmail.com
> http://sites.google.com/site/akohlmey/
> 
> Institute for Computational Molecular Science
> Temple University, Philadelphia PA, USA.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 163 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20111111/5976bc86/attachment-0001.bin 


More information about the torqueusers mailing list