[torquedev] processor affinity
Toni L. Harbaugh-Blackford [Contr]
harbaugh at ncifcrf.gov
Sun Jun 3 08:10:41 MDT 2007
On Sun, 3 Jun 2007, Menshutin Anton wrote:
>
> I still can't understand why you are saying that torque does not assign cpus
> on a node for a job. It does.
It does not assign *SPECIFIC* cpus in time shared mode *OR* when jobs are submitted
using "ncpus=X" instead of "ppn=X".
In either of these cases, on a 64 processor machine, if a 4 cpu job comes in Torque
DOES NOT assign that job to cpus 7-10, for instance. Torque only "allocates" four
cpus to the job for accounting purposes, to keep from oversubscribing the whole system.
> First off all, I don't using node in shared
> mode. One process - one cpu. If I have SMP with 4 cpu (which is my case) and
> I have set np=4 than this is equivalent to 4 single CPU nodes. Doing so,
> torque will assign jobs to this virtual nodes, which names are 'node5/3'
> 'node4/1' and so on.
>
> Of course, if I want to use shared mode - more tasks than number of CPUs - I
> do need CPU sets (as far as I understand what this feature does).
> But is case when several different jobs does not share cpus
> sched_setaffinity() is enough. This is also mentioned here -
> http://www.bullopensource.org/cpuset/.
>
In timeshared mode or for jobs with "ncpus=X", you will need to decide which cpus to do
"sched_setaffinity()" on.
> It seems that I have to parse the job attribute exec_host to find out cpus
> numbers assigned to the job.
>
You will need to do more than that. Even if a node is designated as type "cluster",
a user can submit a job using "ncpus=X" instead of "nodes=1:ppn=X", and the exec_host
will not appear with the individual cpus broken out. For example:
$ qstat -n -1
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- -------- -------- ---------- ------ ----- --- ------ ----- - -----
25446.mandark.ncifcr harbaugh small STDIN 11422 1 4 -- 24:00 R -- dexter/0
25446 is a four cpu job started with "ncpus=4".
If you know that user's at your site will never use "ncpus=X", then it doesn't matter
for you, but for sites in general you cannot assume this.
Toni
>
> -----Original Message-----
> From: Toni L. Harbaugh-Blackford [Contr] [mailto:harbaugh at ncifcrf.gov]
> Sent: Sunday, June 03, 2007 5:21 PM
> To: Menshutin Anton
> Cc: 'Sergio Gelato'; torquedev at supercluster.org
> Subject: RE: [torquedev] processor affinity
>
>
>
>
> On Sun, 3 Jun 2007, Menshutin Anton wrote:
>
> > Well, I found this code about cpusets in latest snapshot of torque (I
> was
> > using 2.6.1). But is seems unfinished.
>
> The code is not unfinished, it is just system specific. It is for systems
> that have the libcpuset library.
>
> > May be cpusets are more powerful that approach with shed_setaffinity.
> > Here is some info from web page about cpuset - Many applications (as it
> is
> > often the case for HPC apps) use to have a "one process on one
> processor"
> > policy. They can use sched_setaffinity() to do so...
> >
> > So sched_setaffinity() is my choice. Statement that
> > Torque does not assign specific cpus seems to be not absolutely correct.
>
> Torque does not assign cpus for systems that do not support libcpuset.
> Look at the functions in resmom/linux/cpuset.c. If your system has them
> then it is possible you could run a cpuset-aware mom.
>
> > If a have SMP nodes and I set np=4 for example for each node, than
> torque
> > treats it as virtual node with single processor.
> > Here is an example line from 'qstat -f' output from my system.
> > exec_host = node5/3
> > I could treat this info as node5 cpu3. After that I could set
> cpuaffinity.
> >
>
> You best think about this in terms of big SMPs. To implement cpu affinity,
> you need to keep track of which cpu's you've already assigned to which jobs.
> You don't want to assign two jobs to be running on the same cpus'
> accidentally.
>
> If you have a 128p system and a mix of 8, 4, and 1 cpu jobs come in, how
> do you manage where they run? How do you track which cpus are freed when
> the jobs exit, so you can reassign those cpus to another job?
>
> Toni
>
> >
> > -----Original Message-----
> > From: Sergio Gelato [mailto:Sergio.Gelato at astro.su.se]
> > Sent: Sunday, June 03, 2007 12:49 AM
> > To: Menshutin Anton
> > Cc: torquedev at supercluster.org
> > Subject: Re: [torquedev] processor affinity
> >
> > * Menshutin Anton [2007-06-02 17:02:12 +0400]:
> > > I found that there is no processor affinity in torque. Jobs assigned
> to
> > run
> > > on some cpu's selected by scheduler, could also run on other cpus on
> this
> > > node.
> >
> > Really? I see in the trunk's src/resmom/start_exec.c a few
> > #elif defined(PENABLE_LINUX26_CPUSETS)
> > which make me think that something related to what you are looking for
> > is already implemented.
> >
> > > This property is inherited by child process. It is obvious that
> setting it
> > > after fork() and before exec() will be enough. The only thing I don't
> know
> > -
> > > where can I get info about cpus assigned to me by scheduler.
> >
> > The code already uses
> > pattr = &pjob->ji_wattr[(int)JOB_ATR_resource];
> > prd = find_resc_def(svr_resc_def,"ncpus",svr_resc_size);
> > presc = find_resc_entry(pattr,prd);
> > to find the value of the job resource "ncpus".
> >
> > > Qstat shows this info in exec_host attribute, and I suppose I can get
> this
> > > string, parse it, find out localhostname and get CPUs numbers. But may
> be
> > > there is a better way for getting this info?
> > >
> > > I'm asking for help from torque-dev mailing list :) Given an advice, I
> > could
> > > try to implement and test it myself or may be anybody could send me a
> > patch?
> >
>
> -------------------------------------------------------------------
> Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
> System Administrator
> Advanced Biomedical Computing Center (ABCC)
> National Cancer Institute
> Contractor - SAIC/Frederick
>
-------------------------------------------------------------------
Toni Harbaugh-Blackford harbaugh at ncifcrf.gov
System Administrator
Advanced Biomedical Computing Center (ABCC)
National Cancer Institute
Contractor - SAIC/Frederick
More information about the torquedev
mailing list