[torqueusers] Using "ncpus" confuses scheduler

Nick Lindberg nlindberg at mkei.org
Thu Mar 27 08:09:03 MDT 2014


I am seeing some weird behavior that I think I know the culprit of, but would like a second pair of eyes.  I have a user who has been submitting jobs using

#PBS –l ncpus=4

What is happening is that this job is getting scheduled on a 16 core node, but thinks that it is taking all 16 processors when really it’s only requesting 4.  There is this weird “Attributes” line in checknode output.  I’ve pasted the output below.  You can see there is one reservation requesting 4 processors, but it thinks dedicated resources is at 16, and it says

Attributes:       Processors=4

almost like it’s multiplying requested processors by that attribute.  I have no idea where that attribute comes from.  And what is happening is Moab thinks these nodes are full, but really they’re not and my cluster is only running at 60% utilization (which is reported correctly in showq.)

What does torque do with ncpus, and is there a way for me to not only discourage but disallow this Torque pragma?  I think “procs=4“ or “nodes=1:ppn=4”  behaves normally.  Has anybody ever seen this?

[root at bright ~]# checknode -v compute-002
node compute-002

State:      Busy  (in current state for 00:00:23)
Configured Resources: PROCS: 16  MEM: 62G  SWAP: 74G  DISK: 1M
Utilized   Resources: PROCS: 16  SWAP: 6285M
Dedicated  Resources: PROCS: 16
Attributes:         Processors=4
  MTBF(longterm):   INFINITY  MTBF(24h):   INFINITY
Opsys:      linux     Arch:      ---
Speed:      1.00      CPULoad:   1.000
Partition:  torque  Rack/Slot:  ---  NodeIndex:  2
IdleTime:   58:18:24:38
Classes:    [batch]
RM[torque]* TYPE=PBS
EffNodeAccessPolicy: SHARED

Total Time:    174days  Up:    172days (98.94%)  Active: 93:14:24:12 (53.57%)

  17035x4  Job:Running  -2:17:28:55 -> 7:06:31:05 (10:00:00:00)
Jobs:        17035
ALERT:  node is in state Busy but load is low (1.000)

Nick Lindberg
Director of Engineering
Milwaukee Institute
414-727-6413 (office)
608-215-3508 (mobile)
nlindberg at mkei.org<mailto:nlindberg at mkei.org> | www.mkei.org<http://www.mkei.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140327/e6779bba/attachment.html 

More information about the torqueusers mailing list