[torquedev] processor affinity

Toni L. Harbaugh-Blackford [Contr] harbaugh at ncifcrf.gov
Sun Jun 3 08:10:41 MDT 2007


On Sun, 3 Jun 2007, Menshutin Anton wrote:

  >
  > I still can't understand why you are saying that torque does not assign cpus
  > on a node for a job. It does.

It does not assign *SPECIFIC* cpus in time shared mode *OR* when jobs are submitted
using "ncpus=X" instead of "ppn=X".

In either of these cases, on a 64 processor machine, if a 4 cpu job comes in Torque
DOES NOT assign that job to cpus 7-10, for instance.  Torque only "allocates" four
cpus to the job for accounting purposes, to keep from oversubscribing the whole system.

  > First off all, I don't using node in shared
  > mode. One process - one cpu. If I have SMP with 4 cpu (which is my case) and
  > I have set np=4 than this is equivalent to 4 single CPU nodes. Doing so,
  > torque will assign jobs to this virtual nodes, which names are 'node5/3'
  > 'node4/1' and so on.
  >
  > Of course, if I want to use shared mode - more tasks than number of CPUs - I
  > do need CPU sets (as far as I understand what this feature does).
  > But is case when several different jobs does not share cpus
  > sched_setaffinity() is enough. This is also mentioned here -
  > http://www.bullopensource.org/cpuset/.
  >

In timeshared mode or for jobs with "ncpus=X", you will need to decide which cpus to do
"sched_setaffinity()" on.

  > It seems that I have to parse the job attribute exec_host to find out cpus
  > numbers assigned to the job.
  >

You will need to do more than that.  Even if a node is designated as type "cluster",
a user can submit a job using "ncpus=X" instead of "nodes=1:ppn=X", and the exec_host
will not appear with the individual cpus broken out.  For example:

  $ qstat -n -1
                                                                     Req'd  Req'd   Elap
  Job ID               Username Queue    Jobname    SessID NDS   TSK Memory Time  S Time
  -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
  25446.mandark.ncifcr harbaugh small    STDIN       11422     1   4    --  24:00 R   --    dexter/0

25446 is a four cpu job started with "ncpus=4".

If you know that user's at your site will never use "ncpus=X", then it doesn't matter
for you, but for sites in general you cannot assume this.

Toni


  >
  > -----Original Message-----
  > From: Toni L. Harbaugh-Blackford [Contr] [mailto:harbaugh at ncifcrf.gov]
  > Sent: Sunday, June 03, 2007 5:21 PM
  > To: Menshutin Anton
  > Cc: 'Sergio Gelato'; torquedev at supercluster.org
  > Subject: RE: [torquedev] processor affinity
  >
  >
  >
  >
  > On Sun, 3 Jun 2007, Menshutin Anton wrote:
  >
  >   > Well, I found this code about cpusets in latest snapshot of torque (I
  > was
  >   > using 2.6.1). But is seems unfinished.
  >
  > The code is not unfinished, it is just system specific.  It is for systems
  > that have the libcpuset library.
  >
  >   > May be cpusets are more powerful that approach with shed_setaffinity.
  >   > Here is some info from web page about cpuset - Many applications (as it
  > is
  >   > often the case for HPC apps) use to have a "one process on one
  > processor"
  >   > policy. They can use sched_setaffinity() to do so...
  >   >
  >   > So sched_setaffinity() is my choice. Statement that
  >   > Torque does not assign specific cpus seems to be not absolutely correct.
  >
  > Torque does not assign cpus for systems that do not support libcpuset.
  > Look at the functions in resmom/linux/cpuset.c.  If your system has them
  > then it is possible you could run a cpuset-aware mom.
  >
  >   > If a have SMP nodes and I set np=4 for example for each node, than
  > torque
  >   > treats it as virtual node with single processor.
  >   > Here is an example line from 'qstat -f' output from my system.
  >   > exec_host = node5/3
  >   > I could treat this info as node5 cpu3.  After that I could set
  > cpuaffinity.
  >   >
  >
  > You best think about this in terms of big SMPs.  To implement cpu affinity,
  > you need to keep track of which cpu's you've already assigned to which jobs.
  > You don't want to assign two jobs to be running on the same cpus'
  > accidentally.
  >
  > If you have a 128p system and a mix of 8, 4, and 1 cpu jobs come in, how
  > do you manage where they run?  How do you track which cpus are freed when
  > the jobs exit, so you can reassign those cpus to another job?
  >
  > Toni
  >
  >   >
  >   > -----Original Message-----
  >   > From: Sergio Gelato [mailto:Sergio.Gelato at astro.su.se]
  >   > Sent: Sunday, June 03, 2007 12:49 AM
  >   > To: Menshutin Anton
  >   > Cc: torquedev at supercluster.org
  >   > Subject: Re: [torquedev] processor affinity
  >   >
  >   > * Menshutin Anton [2007-06-02 17:02:12 +0400]:
  >   > > I found that there is no processor affinity in torque. Jobs assigned
  > to
  >   > run
  >   > > on some cpu's selected by scheduler, could also run on other cpus on
  > this
  >   > > node.
  >   >
  >   > Really? I see in the trunk's src/resmom/start_exec.c a few
  >   > #elif defined(PENABLE_LINUX26_CPUSETS)
  >   > which make me think that something related to what you are looking for
  >   > is already implemented.
  >   >
  >   > > This property is inherited by child process. It is obvious that
  > setting it
  >   > > after fork() and before exec() will be enough. The only thing I don't
  > know
  >   > -
  >   > > where can I get info about cpus assigned to me by scheduler.
  >   >
  >   > The code already uses
  >   >   pattr = &pjob->ji_wattr[(int)JOB_ATR_resource];
  >   >   prd = find_resc_def(svr_resc_def,"ncpus",svr_resc_size);
  >   >   presc = find_resc_entry(pattr,prd);
  >   > to find the value of the job resource "ncpus".
  >   >
  >   > > Qstat shows this info in exec_host attribute, and I suppose I can get
  > this
  >   > > string, parse it, find out localhostname and get CPUs numbers. But may
  > be
  >   > > there is a better way for getting this info?
  >   > >
  >   > > I'm asking for help from torque-dev mailing list :) Given an advice, I
  >   > could
  >   > > try to implement and test it myself or may be anybody could send me a
  >   > patch?
  >   >
  >
  > -------------------------------------------------------------------
  > Toni Harbaugh-Blackford                       harbaugh at ncifcrf.gov
  > System Administrator
  > Advanced Biomedical Computing Center (ABCC)
  > National Cancer Institute
  > Contractor - SAIC/Frederick
  >

-------------------------------------------------------------------
Toni Harbaugh-Blackford                       harbaugh at ncifcrf.gov
System Administrator
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick


More information about the torquedev mailing list