[torqueusers] Submitting jobs to use multiprocessors.

Gus Correa gus at ldeo.columbia.edu
Thu Mar 20 16:05:53 MDT 2014


Hi Hitesh

1) Did you actually write the "less than" ("<") and
"greater than" (">") characters in your $TORQUE/server_priv/nodes file?
Or are those "<" and ">" just typos in your email?
Or perhaps you don't want the actual node's names to appear on this
mailing list?

 >>     Did you create a $TORQUE/pbs_server/nodes file? *Yes*
 >>
 >>     What are the contents of that file?
 >>     *<node1> np=2
 >>     <node2> np=2*
 >>


The "<" and ">" shouldn't be there, unless you have very unusual
names for your nodes.
There are also some "*" in the lines above that should not be there,
but you may have added that to the email as a highlight, I don't know.

I expected something like this for the file contents (2 lines only,
no "<" or ">").

node1 np=2
node2 np=8

(You said the nodes have 2 and 8 cores/cpus, so one of them should
have np=2, and the other np=8, unless you don't want to use all
cores.
I am assuming node2 is the one with 8 cores, otherwise
you need to adjust the numbers above accordingly.)

2) You say Maui is enabled.
So, I assume the maui daemon is running, right?

However, you must also enable scheduling on the Torque/PBS server.
Did you enable that option?
What is the output of this?

qmgr -c 'p s' | grep scheduling

If it says "False", you need to do:

qmgr -c  'set server scheduling = True'

I hope this helps,
Gus Correa

On 03/20/2014 02:49 PM, hitesh chugani wrote:
> Hi Sven,
>
> These are the parameters in the job file
>
> #!/bin/bash
> #PBS -l nodes=2:ppn=2
> #PBS -k o
> #PBS -m abe
> #PBS -N JobName
> #PBS -V
> #PBS -j oe
>
> Thanks,
> Hitesh Chugani.
>
>
>
>
>
>
>
> On Thu, Mar 20, 2014 at 2:45 PM, Sven Schumacher
> <schumacher at tfd.uni-hannover.de <mailto:schumacher at tfd.uni-hannover.de>>
> wrote:
>
>     Hello,
>
>     what PBS-specific parameters do you specify for your qsub-command or
>     in your job-file?
>     I noticed once, that specifying "mem=" with the total amount of
>     memory needed by the job, results in not starting jobs, because maui
>     can't decide if it is the memory requirement of the job on one of
>     the nodes or of all jobs together... so please tell us your used
>     qsub-parameters...
>
>     Thanks
>
>     Sven Schumacher
>
>     Am 20.03.2014 19:30, schrieb hitesh chugani:
>>     Hi Gus,
>>
>>
>>     Did you create a $TORQUE/pbs_server/nodes file? *Yes*
>>
>>     What are the contents of that file?
>>     *<node1> np=2
>>     <node2> np=2*
>>
>>     What is the output of "pbsnodes -a"?
>>     *<node1>
>>     *
>>     *     state = free
>>          np = 2
>>          ntype = cluster
>>          status =
>>     rectime=1395339913,varattr=,jobs=,state=free,netload=8159659934
>>     <tel:8159659934>,gres=,loadave=0.00,ncpus=2,physmem=3848508kb,availmem=15671808kb,totmem=16300340kb,idletime=89,nusers=2,nsessions=22,sessions=2084
>>     2619 2839 2855 2873 2877 2879 2887 2889 2916 2893 2891 3333 6665
>>     3053 8036 25960 21736 22263 23582 26141 30680,uname=Linux lws81
>>     2.6.18-371.4.1.el5 #1 SMP Wed Jan 8 18:42:07 EST 2014
>>     x86_64,opsys=linux
>>          mom_service_port = 15002
>>          mom_manager_port = 15003
>>
>>     *
>>     *<node2>
>>     *
>>     *     state = free
>>          np = 2
>>          ntype = cluster
>>          status =
>>     rectime=1395339913,varattr=,jobs=,state=free,netload=2817775035
>>     <tel:2817775035>,gres=,loadave=0.00,ncpus=8,physmem=16265764kb,availmem=52900464kb,totmem=55259676kb,idletime=187474,nusers=3,nsessions=4,sessions=11923
>>     17547 20030 29392,uname=Linux lws10.uncc.edu
>>     <http://lws10.uncc.edu> 2.6.18-371.4.1.el5 #1 SMP Wed Jan 8
>>     18:42:07 EST 2014 x86_64,opsys=linux
>>          mom_service_port = 15002
>>          mom_manager_port = 15003*
>>
>>
>>     Did you enable scheduling in the pbs_server? *Maui is enabled*
>>
>>
>>     Did you keep the --enable-cpuset configuration option? *No. I have
>>     disabled it*
>>
>>
>>     I am able to run single/two node single processor
>>     job(nodes=1(and2):ppn=1). But when i am trying to run
>>     multiprocessor jobs(nodes=2:ppn=2 with nodes having 2 and 8 ncpu),
>>     the job is remaining in queue . I am able to forcefully run the
>>     job via qrun. I am using Maui scheduler.
>>
>>
>>     Please help.
>>
>>
>>     Thanks,
>>     Hitesh chugani.
>>
>>
>>
>>
>>
>>     On Mon, Mar 17, 2014 at 7:35 PM, Gus Correa <gus at ldeo.columbia.edu
>>     <mailto:gus at ldeo.columbia.edu>> wrote:
>>
>>         Hi Hitesh
>>
>>         Did you create a $TORQUE/pbs_server/nodes file?
>>         What are the contents of that file?
>>         What is the output of "pbsnodes -a"?
>>
>>         Make sure the nodes file is there.
>>         If not, create it again, and restart pbs_server.
>>
>>         Did you enable scheduling in the pbs_server?
>>
>>         Also:
>>
>>         Did you keep the --enable-cpuset configuration option?
>>         If you did:
>>         Do you have a /dev/cpuset directory on your nodes?
>>         Do you have a type cpuset filesystem mounted on /dev/cpuset
>>         on the nodes?
>>
>>         Check this link:
>>
>>         http://docs.adaptivecomputing.com/torque/Content/topics/3-nodes/linuxCpusetSupport.htm
>>
>>         Still in the topic of cpuset:
>>
>>         Are you perhaps running cgroups on the nodes (the cgconfig
>>         service)?
>>
>>         I hope this helps,
>>         Gus Correa
>>
>>         On 03/17/2014 05:45 PM, hitesh chugani wrote:
>>         > Hello,
>>         >
>>         > I have reconfigured torque to disable NUMA support. I am
>>         able to run
>>         > single node single processor job(nodes=1:ppn=1). But when i
>>         am trying to
>>         > run multiprocessor jobs(nodes=2:ppn=2 with nodes having 2
>>         and 8 ncpu),
>>         > the job is remaining in queue . I am able to forcefully run
>>         the job via
>>         > qrun. I am using Maui scheduler.  Can anyone please tell me
>>         what may be
>>         > the issue? is it something to do with Maui scheduler? Thanks.
>>         >
>>         > Regards,
>>         > Hitesh Chugani.
>>         >
>>         >
>>         > On Mon, Mar 17, 2014 at 12:40 PM, hitesh chugani
>>         > <hiteshschugani at gmail.com <mailto:hiteshschugani at gmail.com>
>>         <mailto:hiteshschugani at gmail.com
>>         <mailto:hiteshschugani at gmail.com>>> wrote:
>>         >
>>         >     I tried nodes=X:ppn=Y option. It still didn't work . I
>>         guess it is
>>         >     something to deal with NUMA option enabling. I am
>>         looking into this
>>         >     issue and will let you guys know . Thanks a lot
>>         >
>>         >
>>         >
>>         >     On Thu, Mar 13, 2014 at 10:22 AM, Ken Nielson
>>         >     <knielson at adaptivecomputing.com
>>         <mailto:knielson at adaptivecomputing.com>
>>         >     <mailto:knielson at adaptivecomputing.com
>>         <mailto:knielson at adaptivecomputing.com>>> wrote:
>>         >
>>         >         Glen is right. There is a regression with procs.
>>         >
>>         >
>>         >         On Wed, Mar 12, 2014 at 5:29 PM,
>>         <glen.beane at gmail.com <mailto:glen.beane at gmail.com>
>>         >         <mailto:glen.beane at gmail.com
>>         <mailto:glen.beane at gmail.com>>> wrote:
>>         >
>>         >             I think there is a regression in Torque and
>>         procs only works
>>         >             with Moab now. Try nodes=X:ppn=Y
>>         >
>>         >
>>         >             On Mar 12, 2014, at 6:26 PM, hitesh chugani
>>         >             <hiteshschugani at gmail.com
>>         <mailto:hiteshschugani at gmail.com>
>>         <mailto:hiteshschugani at gmail.com
>>         <mailto:hiteshschugani at gmail.com>>>
>>         >             wrote:
>>         >
>>         >>             Hi all,
>>         >>
>>         >>
>>         >>             I am trying to submit a job with to use
>>         >>             multiprocessors(Added #PBS -l procs=4 in the
>>         job script)
>>         >>             but the job is remaining queued forever. I am
>>         using 2
>>         >>             computes nodes (ncpus=8 and 2). Any idea why is
>>         it not
>>         >>             running? Please help.
>>         >>
>>         >>             I have installed torque using this
>>         configuration option.
>>         >>             *./configure --enable-unixsockets --enable-cpuset
>>         >>             --enable-geometry-requests --enable-numa-support *
>>         >>
>>         >>
>>         >>
>>         >>
>>         >>             Thanks,
>>         >>             Hitesh Chugani.
>>         >>             Student Linux specialist
>>         >>             University of North Carolina, Charlotte
>>         >> _______________________________________________
>>         >>
>>         >>             torqueusers mailing list
>>         >> torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>
>>         >>             <mailto:torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>>
>>         >> http://www.supercluster.org/mailman/listinfo/torqueusers
>>         >
>>         > _______________________________________________
>>         >             torqueusers mailing list
>>         > torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>
>>         >             <mailto:torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>>
>>         > http://www.supercluster.org/mailman/listinfo/torqueusers
>>         >
>>         >
>>         >
>>         >
>>         >         --
>>         >         Ken Nielson
>>         > +1 801.717.3700 <tel:%2B1%20801.717.3700>
>>         <tel:%2B1%20801.717.3700> office +1 801.717.3738
>>         <tel:%2B1%20801.717.3738>
>>         >         <tel:%2B1%20801.717.3738> fax
>>         >         1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
>>         > www.adaptivecomputing.com <http://www.adaptivecomputing.com>
>>         <http://www.adaptivecomputing.com>
>>         >
>>         >
>>         >         _______________________________________________
>>         >         torqueusers mailing list
>>         > torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>
>>         <mailto:torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>>
>>         > http://www.supercluster.org/mailman/listinfo/torqueusers
>>         >
>>         >
>>         >
>>         >
>>         >
>>         > _______________________________________________
>>         > torqueusers mailing list
>>         > torqueusers at supercluster.org
>>         <mailto:torqueusers at supercluster.org>
>>         > http://www.supercluster.org/mailman/listinfo/torqueusers
>>         >
>>
>>         _______________________________________________
>>         torqueusers mailing list
>>         torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>>         http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>>
>>
>>     _______________________________________________
>>     torqueusers mailing list
>>     torqueusers at supercluster.org  <mailto:torqueusers at supercluster.org>
>>     http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>     --
>     Sven Schumacher - Systemadministrator Tel: (0511)762-2753
>     Leibniz Universitaet Hannover
>     Institut für Turbomaschinen und Fluid-Dynamik       - TFD
>     Appelstraße 9 - 30167 Hannover
>     Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
>     Callinstraße 36 - 30167 Hannover
>
>
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



More information about the torqueusers mailing list