[torqueusers] Fwd: ncpus anyone?
Dr. Stephan Raub
raub at uni-duesseldorf.de
Tue Mar 2 11:52:05 MST 2010
Hi,
our Institution had been using PBSPro for some while. We dumped it and are
now using Torque/Maui for several reasons, that dont belong in this forum
(no, it was not because of money), BUT: I liked the idea of their chunks.
For example: as a quantum chemist Im using TurboMole a lot. For parallel
runs with n compute processes it requires an additional master-process which
is exactly on the same node as the first compute-prozess. Alas, this master
process doesnt need a lot of memory. So I used a statement like
select=1:ncpus=2:mem=15gb+15:ncpus=1:mem=15gb. Up to now I havent figured
out a equivalent statement for torque/maui. nodes=1:ppn=2+15:ppn=1 and
pmem=8gb is not the same, as I am not able to allocate ALL memory of a node
for the job.
The same with Jobs using heterogeneous mpi topologies (e.g. itanium and xeon
in the same mpi topology).
I dont want to say, that PBSPro is better than Torque/Maui (as we find out
the opposite in the hard way), but the pure theoretical concept of these
resource chunks was quite useful.
Stephan
--
---------------------------------------------------------
| | Dr. rer. nat. Stephan Raub
| | Dipl. Chem.
| | Lehrstuhl für IT-Management / ZIM
| | Heinrich-Heine-Universität Düsseldorf Universitätsstr. 1 /
| | 25.41.O2.25-2
| | 40225 Düsseldorf / Germany
| |
| | Tel: +49-211-811-3911
---------------------------------------------------------
Wichtiger Hinweis: Diese E-Mail kann Betriebs- oder Geschäftsgeheimnisse,
bzw.
sonstige vertrauliche Informationen enthalten. Sollten Sie diese E-Mail
irrtümlich erhalten haben, ist Ihnen eine Kenntnisnahme des Inhalts, eine
Vervielfältigung oder Weitergabe der E-Mail ausdrücklich untersagt. Bitte
benachrichtigen Sie uns und vernichten Sie die empfangene E-Mail. Vielen
Dank.
Important Note: This e-mail may contain trade secrets or privileged,
undisclosed or otherwise confidential information. If you have received this
e-mail in error, you are hereby notified that any review, copying or
distribution of it is strictly prohibited. Please inform us immediately and
destroy the original transmittal. Thank you for your cooperation.
Von: torqueusers-bounces at supercluster.org
[mailto:torqueusers-bounces at supercluster.org] Im Auftrag von kamil
Marcinkowski
Gesendet: Dienstag, 2. März 2010 19:26
An: Josh Bernstein
Cc: torqueusers
Betreff: Re: [torqueusers] Fwd: ncpus anyone?
Hello Josh
You should use the (procs=32) specification for parallel jobs
that don't care where they run.
npus used to have 2 different and opposite meanings on
SMPs(nodes=1:ppn=32) and clusters (nodes=32:ppn=1).
I vote for defining -lncpus=32 to -lnodes=1:ppn=32.
Cheers,
Kamil
Kamil Marcinkowski Westgrid System Administrator
kamil at ualberta.ca University of Alberta site
Tel.780 492-0354 Research Computing Support
Fax.780 492-1729 Academic ICT
Edmonton, Alberta, CANADA University of Alberta
"This communication is intended for the use of the recipient to which it is
addressed, and may contain confidential, personal, and/or privileged
information. Please contact us immediately if you are not the intended
recipient of this communication. If you are not the intended recipient of
this communication, do not copy, distribute, or take action on it. Any
communication received in error, or subsequent reply, should be deleted or
destroyed."
On 2010-03-02, at 10:58 AM, Josh Bernstein wrote:
I vote for maintaing ncpus. It's very helpful for embarrssingly
parallel jobs that just need 32 CPUs but don't care where they come
from.
-Josh
On Mar 2, 2010, at 9:53 AM, "David Beer" <dbeer at adaptivecomputing.com>
wrote:
Just to let everyone know, the qstat -a output has been changed to
read both the value stored in nodes and ncpus, using nodes when both
are specified.
Changing the code so that qstat -a displays correctly the number of
tasks with -lnodes=1:ppn=32 would be great. Then, you could also make
sure that -lncpus=32 is a complete synonymous of -lnodes=1:ppn=32.
Is this the behavior that everyone expects/hopes for? If so, we can
look at working on it. At the same time, TORQUE 3.0 is likely to
include much superior specification for how we are requesting
resources, which may end up including ncpus and may not. We're
looking to remove a lot of ambiguity and enhance capability. By the
way. we're still open to input as to how all that will work, but
maybe we'll send out some ideas shortly if nobody has any input yet.
Cheers,
David
----- "Michel Béland" <michel.beland at rqchp.qc.ca> wrote:
David Beer wrote:
So, if I understand correctly, ncpus really only works for people
that are running SMP or similar systems? It seems like we definitely
need to update our documentation as I feel it is misleading on the
matter. Among other things, it seems that a clarification needs to be
made that ncpus isn't compatible with the nodes attribute.
It is possible to specify both. In fact, at our site we have a qsub
wrapper script that makes sure, among other things, that everybody
specifies both on our Altix systems.
On a related note, in the qstat -a output we have the TSK field,
which I believe is meant to mean task (I couldn't find anything about
it in the man page, the variable in the code is named tasks). I
noticed that in the implementation we're just writing whatever value
is stored in ncpus for this field. It seems like this could be made
more accurate by checking the nodes attribute as well and using that
value where it is defined, since it seems to override ncpus when both
are present. What are you're thoughts on this?
I agree. This is exactly why we make sure that all the jobs have both
resource requests. If one specifies -lnodes=1:ppn=32, the output of
qstat -a does not show how many cores you really use. On the other
hand,
if one specifies -lncpus=32, Torque does not create cpusets correctly
(they always contain only processor 0). So if I specify -lncpus=32
-lnodes=1:ppn=32, cpusets are created correctly and qstat -a shows
correctly how many cores the job is using. Maui, does not have any
problem dealing with this job.
--
Michel Béland, analyste en calcul scientifique
michel.beland at rqchp.qc.ca
bureau S-250, pavillon Roger-Gaudry (principal), Université de
Montréal
téléphone : 514 343-6111 poste 3892 télécopieur : 514 343-2
155
RQCHP (Réseau québécois de calcul de haute performance)
www.rqchp.qc.ca
--
David Beer | Senior Software Engineer
Adaptive Computing
--
David Beer | Senior Software Engineer
Adaptive Computing
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
_______________________________________________
torqueusers mailing list
torqueusers at supercluster.org
http://www.supercluster.org/mailman/listinfo/torqueusers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100302/e2197352/attachment-0001.html
More information about the torqueusers
mailing list