[torqueusers] Fwd: ncpus anyone?

kamil Marcinkowski kamil at ualberta.ca
Tue Mar 2 11:25:55 MST 2010


Hello Josh

You should use the (procs=32) specification for parallel jobs 
that don't care where they run.

npus used to have 2 different and opposite meanings on 
SMPs(nodes=1:ppn=32) and clusters (nodes=32:ppn=1).

I vote for  defining -lncpus=32  to  -lnodes=1:ppn=32.

Cheers,

Kamil


Kamil Marcinkowski                   Westgrid System Administrator 
kamil at ualberta.ca                     University of Alberta site                     
 Tel.780 492-0354                     Research Computing Support   
Fax.780 492-1729                     Academic ICT  
Edmonton, Alberta, CANADA    University of Alberta           


"This communication is intended for the use of the recipient to which it is
addressed, and may contain confidential, personal, and/or privileged
information.  Please contact us immediately if you are not the intended
recipient of this communication.  If you are not the intended recipient of
this communication, do not copy, distribute, or take action on it. Any
communication received in error, or subsequent reply, should be deleted or
destroyed."



On 2010-03-02, at 10:58 AM, Josh Bernstein wrote:

> I vote for maintaing ncpus. It's very helpful for embarrssingly  
> parallel jobs that just need 32 CPUs but don't care where they come  
> from.
> 
> -Josh
> 
> On Mar 2, 2010, at 9:53 AM, "David Beer" <dbeer at adaptivecomputing.com>  
> wrote:
> 
>> Just to let everyone know, the qstat -a output has been changed to  
>> read both the value stored in nodes and ncpus, using nodes when both  
>> are specified.
>> 
>>> Changing the code so that qstat -a displays correctly the number of
>>> tasks with -lnodes=1:ppn=32 would be great. Then, you could also make
>>> 
>>> sure that -lncpus=32 is a complete synonymous of -lnodes=1:ppn=32.
>> 
>> Is this the behavior that everyone expects/hopes for? If so, we can  
>> look at working on it. At the same time, TORQUE 3.0 is likely to  
>> include much superior specification for how we are requesting  
>> resources, which may end up including ncpus and may not. We're  
>> looking to remove a lot of ambiguity and enhance capability. By the  
>> way. we're still open to input as to how all that will work, but  
>> maybe we'll send out some ideas shortly if nobody has any input yet.
>> 
>> Cheers,
>> 
>> David
>> 
>> ----- "Michel Béland" <michel.beland at rqchp.qc.ca> wrote:
>> 
>>> David Beer wrote:
>>> 
>>>> So, if I understand correctly, ncpus really only works for people
>>> that are running SMP or similar systems? It seems like we definitely
>>> need to update our documentation as I feel it is misleading on the
>>> matter. Among other things, it seems that a clarification needs to be
>>> made that ncpus isn't compatible with the nodes attribute.
>>> 
>>> It is possible to specify both. In fact, at our site we have a qsub
>>> wrapper script that makes sure, among other things, that everybody
>>> specifies both on our Altix systems.
>>> 
>>>> On a related note, in the qstat -a output we have the TSK field,
>>> which I believe is meant to mean task (I couldn't find anything about
>>> it in the man page, the variable in the code is named tasks). I
>>> noticed that in the implementation we're just writing whatever value
>>> is stored in ncpus for this field. It seems like this could be made
>>> more accurate by checking the nodes attribute as well and using that
>>> value where it is defined, since it seems to override ncpus when both
>>> are present. What are you're thoughts on this?
>>> 
>>> I agree. This is exactly why we make sure that all the jobs have both
>>> 
>>> resource requests. If one specifies -lnodes=1:ppn=32, the output of
>>> qstat -a does not show how many cores you really use. On the other
>>> hand,
>>> if one specifies -lncpus=32, Torque does not create cpusets correctly
>>> 
>>> (they always contain only processor 0). So if I specify -lncpus=32
>>> -lnodes=1:ppn=32, cpusets are created correctly and qstat -a shows
>>> correctly how many cores the job is using. Maui, does not have any
>>> problem dealing with this job.
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> Michel Béland, analyste en calcul scientifique
>>> michel.beland at rqchp.qc.ca
>>> bureau S-250, pavillon Roger-Gaudry (principal), Université de
>>> Montréal
>>> téléphone : 514 343-6111 poste 3892     télécopieur : 514 343-2 
>>> 155
>>> RQCHP (Réseau québécois de calcul de haute performance)
>>> www.rqchp.qc.ca
>> 
>> -- 
>> David Beer | Senior Software Engineer
>> Adaptive Computing
>> 
>> 
>> -- 
>> David Beer | Senior Software Engineer
>> Adaptive Computing
>> 
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100302/a442b289/attachment-0001.html 


More information about the torqueusers mailing list