[torquedev] processor affinity

Toni L. Harbaugh-Blackford [Contr] harbaugh at ncifcrf.gov
Sun Jun 3 09:45:56 MDT 2007


On Sun, 3 Jun 2007, Anton Menshutin wrote:

  > Thanks Toni, now it is clear.
  >
  > I was always (we are using Torque only for two month in production) using
  > ppn syntax rather than ncpus because in my opinion behavior of ncpus is
  > undefined. There are rather difficult rules how ncpus is being converted
  > into nodes\number of processors and I use myself and advice my users not to
  > use ncpus.  And with ppn syntax everth just works as expected.
  >
  > I can forbid using ncpus with prologue script filtering this out or refusing
  > such a jobs. There is also no time shared nodes in our cluster.
  >
  > No, the next question is -  where should I put a call to
  > sched_setaffinity()? Could you tell me the function name that is most
  > suitable?
  >

probably TMomFinalizeChild(), in start_exec.c, before the setuid() call where
the job takes on the new user's id.

Look for the text:

  /*
   * become the user, execv the shell and become the real job
   */

Toni


  >
  >
  >
  > > -----Original Message-----
  > > From: Toni L. Harbaugh-Blackford [Contr] [mailto:harbaugh at ncifcrf.gov]
  > > Sent: Sunday, June 03, 2007 6:11 PM
  > > To: Menshutin Anton
  > > Cc: torquedev at supercluster.org
  > > Subject: RE: [torquedev] processor affinity
  > >
  > > On Sun, 3 Jun 2007, Menshutin Anton wrote:
  > >
  > >   >
  > >   > I still can't understand why you are saying that torque does not
  > > assign cpus
  > >   > on a node for a job. It does.
  > >
  > > It does not assign *SPECIFIC* cpus in time shared mode *OR* when jobs are
  > > submitted
  > > using "ncpus=X" instead of "ppn=X".
  > >
  > > In either of these cases, on a 64 processor machine, if a 4 cpu job comes
  > > in Torque
  > > DOES NOT assign that job to cpus 7-10, for instance.  Torque only
  > > "allocates" four
  > > cpus to the job for accounting purposes, to keep from oversubscribing the
  > > whole system.
  > >
  > >   > First off all, I don't using node in shared
  > >   > mode. One process - one cpu. If I have SMP with 4 cpu (which is my
  > > case) and
  > >   > I have set np=4 than this is equivalent to 4 single CPU nodes. Doing
  > > so,
  > >   > torque will assign jobs to this virtual nodes, which names are
  > > 'node5/3'
  > >   > 'node4/1' and so on.
  > >   >
  > >   > Of course, if I want to use shared mode - more tasks than number of
  > > CPUs - I
  > >   > do need CPU sets (as far as I understand what this feature does).
  > >   > But is case when several different jobs does not share cpus
  > >   > sched_setaffinity() is enough. This is also mentioned here -
  > >   > http://www.bullopensource.org/cpuset/.
  > >   >
  > >
  > > In timeshared mode or for jobs with "ncpus=X", you will need to decide
  > > which cpus to do
  > > "sched_setaffinity()" on.
  > >
  > >   > It seems that I have to parse the job attribute exec_host to find out
  > > cpus
  > >   > numbers assigned to the job.
  > >   >
  > >
  > > You will need to do more than that.  Even if a node is designated as type
  > > "cluster",
  > > a user can submit a job using "ncpus=X" instead of "nodes=1:ppn=X", and
  > > the exec_host
  > > will not appear with the individual cpus broken out.  For example:
  > >
  > >   $ qstat -n -1
  > >                                                                      Req'd
  > > Req'd   Elap
  > >   Job ID               Username Queue    Jobname    SessID NDS   TSK
  > > Memory Time  S Time
  > >   -------------------- -------- -------- ---------- ------ ----- --- -----
  > > - ----- - -----
  > >   25446.mandark.ncifcr harbaugh small    STDIN       11422     1   4    --
  > > 24:00 R   --    dexter/0
  > >
  > > 25446 is a four cpu job started with "ncpus=4".
  > >
  > > If you know that user's at your site will never use "ncpus=X", then it
  > > doesn't matter
  > > for you, but for sites in general you cannot assume this.
  > >
  > > Toni
  > >
  > >
  > >   >
  > >   > -----Original Message-----
  > >   > From: Toni L. Harbaugh-Blackford [Contr] [mailto:harbaugh at ncifcrf.gov]
  > >   > Sent: Sunday, June 03, 2007 5:21 PM
  > >   > To: Menshutin Anton
  > >   > Cc: 'Sergio Gelato'; torquedev at supercluster.org
  > >   > Subject: RE: [torquedev] processor affinity
  > >   >
  > >   >
  > >   >
  > >   >
  > >   > On Sun, 3 Jun 2007, Menshutin Anton wrote:
  > >   >
  > >   >   > Well, I found this code about cpusets in latest snapshot of torque
  > > (I
  > >   > was
  > >   >   > using 2.6.1). But is seems unfinished.
  > >   >
  > >   > The code is not unfinished, it is just system specific.  It is for
  > > systems
  > >   > that have the libcpuset library.
  > >   >
  > >   >   > May be cpusets are more powerful that approach with
  > > shed_setaffinity.
  > >   >   > Here is some info from web page about cpuset - Many applications
  > > (as it
  > >   > is
  > >   >   > often the case for HPC apps) use to have a "one process on one
  > >   > processor"
  > >   >   > policy. They can use sched_setaffinity() to do so...
  > >   >   >
  > >   >   > So sched_setaffinity() is my choice. Statement that
  > >   >   > Torque does not assign specific cpus seems to be not absolutely
  > > correct.
  > >   >
  > >   > Torque does not assign cpus for systems that do not support libcpuset.
  > >   > Look at the functions in resmom/linux/cpuset.c.  If your system has
  > > them
  > >   > then it is possible you could run a cpuset-aware mom.
  > >   >
  > >   >   > If a have SMP nodes and I set np=4 for example for each node, than
  > >   > torque
  > >   >   > treats it as virtual node with single processor.
  > >   >   > Here is an example line from 'qstat -f' output from my system.
  > >   >   > exec_host = node5/3
  > >   >   > I could treat this info as node5 cpu3.  After that I could set
  > >   > cpuaffinity.
  > >   >   >
  > >   >
  > >   > You best think about this in terms of big SMPs.  To implement cpu
  > > affinity,
  > >   > you need to keep track of which cpu's you've already assigned to which
  > > jobs.
  > >   > You don't want to assign two jobs to be running on the same cpus'
  > >   > accidentally.
  > >   >
  > >   > If you have a 128p system and a mix of 8, 4, and 1 cpu jobs come in,
  > > how
  > >   > do you manage where they run?  How do you track which cpus are freed
  > > when
  > >   > the jobs exit, so you can reassign those cpus to another job?
  > >   >
  > >   > Toni
  > >   >
  > >   >   >
  > >   >   > -----Original Message-----
  > >   >   > From: Sergio Gelato [mailto:Sergio.Gelato at astro.su.se]
  > >   >   > Sent: Sunday, June 03, 2007 12:49 AM
  > >   >   > To: Menshutin Anton
  > >   >   > Cc: torquedev at supercluster.org
  > >   >   > Subject: Re: [torquedev] processor affinity
  > >   >   >
  > >   >   > * Menshutin Anton [2007-06-02 17:02:12 +0400]:
  > >   >   > > I found that there is no processor affinity in torque. Jobs
  > > assigned
  > >   > to
  > >   >   > run
  > >   >   > > on some cpu's selected by scheduler, could also run on other
  > > cpus on
  > >   > this
  > >   >   > > node.
  > >   >   >
  > >   >   > Really? I see in the trunk's src/resmom/start_exec.c a few
  > >   >   > #elif defined(PENABLE_LINUX26_CPUSETS)
  > >   >   > which make me think that something related to what you are looking
  > > for
  > >   >   > is already implemented.
  > >   >   >
  > >   >   > > This property is inherited by child process. It is obvious that
  > >   > setting it
  > >   >   > > after fork() and before exec() will be enough. The only thing I
  > > don't
  > >   > know
  > >   >   > -
  > >   >   > > where can I get info about cpus assigned to me by scheduler.
  > >   >   >
  > >   >   > The code already uses
  > >   >   >   pattr = &pjob->ji_wattr[(int)JOB_ATR_resource];
  > >   >   >   prd = find_resc_def(svr_resc_def,"ncpus",svr_resc_size);
  > >   >   >   presc = find_resc_entry(pattr,prd);
  > >   >   > to find the value of the job resource "ncpus".
  > >   >   >
  > >   >   > > Qstat shows this info in exec_host attribute, and I suppose I
  > > can get
  > >   > this
  > >   >   > > string, parse it, find out localhostname and get CPUs numbers.
  > > But may
  > >   > be
  > >   >   > > there is a better way for getting this info?
  > >   >   > >
  > >   >   > > I'm asking for help from torque-dev mailing list :) Given an
  > > advice, I
  > >   >   > could
  > >   >   > > try to implement and test it myself or may be anybody could send
  > > me a
  > >   >   > patch?
  > >   >   >
  > >   >
  > >   > -------------------------------------------------------------------
  > >   > Toni Harbaugh-Blackford                       harbaugh at ncifcrf.gov
  > >   > System Administrator
  > >   > Advanced Biomedical Computing Center (ABCC)
  > >   > National Cancer Institute
  > >   > Contractor - SAIC/Frederick
  > >   >
  > >
  > > -------------------------------------------------------------------
  > > Toni Harbaugh-Blackford                       harbaugh at ncifcrf.gov
  > > System Administrator
  > > Advanced Biomedical Computing Center (ABCC)
  > > National Cancer Institute
  > > Contractor - SAIC/Frederick
  >

-------------------------------------------------------------------
Toni Harbaugh-Blackford                       harbaugh at ncifcrf.gov
System Administrator
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick


More information about the torquedev mailing list