[torqueusers] Submitting jobs to use multiprocessors.

hitesh chugani hiteshschugani at gmail.com
Fri Mar 21 09:00:01 MDT 2014


Hi Gus,

Sorry to confuse you.. I didn't actually used the symbols "<" and ">" . i
have something like this
node1 np=2
node2 np=8

I did change the numbers to match the number of cores. The issue still
shows up.

The maui daemon and scheduling is also enabled.The output is:

#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
*set server scheduling = True*
set server acl_hosts = lws7
set server managers = hchugani at lws7.uncc.edu
set server operators = hchugani at lws7.uncc.edu
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 10
set server moab_array_compatible = True

Thanks,
Hitesh Chugani.




On Thu, Mar 20, 2014 at 6:05 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:

> Hi Hitesh
>
> 1) Did you actually write the "less than" ("<") and
> "greater than" (">") characters in your $TORQUE/server_priv/nodes file?
> Or are those "<" and ">" just typos in your email?
> Or perhaps you don't want the actual node's names to appear on this
> mailing list?
>
>  >>     Did you create a $TORQUE/pbs_server/nodes file? *Yes*
>  >>
>  >>     What are the contents of that file?
>  >>     *<node1> np=2
>  >>     <node2> np=2*
>  >>
>
>
> The "<" and ">" shouldn't be there, unless you have very unusual
> names for your nodes.
> There are also some "*" in the lines above that should not be there,
> but you may have added that to the email as a highlight, I don't know.
>
> I expected something like this for the file contents (2 lines only,
> no "<" or ">").
>
> node1 np=2
> node2 np=8
>
> (You said the nodes have 2 and 8 cores/cpus, so one of them should
> have np=2, and the other np=8, unless you don't want to use all
> cores.
> I am assuming node2 is the one with 8 cores, otherwise
> you need to adjust the numbers above accordingly.)
>
> 2) You say Maui is enabled.
> So, I assume the maui daemon is running, right?
>
> However, you must also enable scheduling on the Torque/PBS server.
> Did you enable that option?
> What is the output of this?
>
> qmgr -c 'p s' | grep scheduling
>
> If it says "False", you need to do:
>
> qmgr -c  'set server scheduling = True'
>
> I hope this helps,
> Gus Correa
>
> On 03/20/2014 02:49 PM, hitesh chugani wrote:
> > Hi Sven,
> >
> > These are the parameters in the job file
> >
> > #!/bin/bash
> > #PBS -l nodes=2:ppn=2
> > #PBS -k o
> > #PBS -m abe
> > #PBS -N JobName
> > #PBS -V
> > #PBS -j oe
> >
> > Thanks,
> > Hitesh Chugani.
> >
> >
> >
> >
> >
> >
> >
> > On Thu, Mar 20, 2014 at 2:45 PM, Sven Schumacher
> > <schumacher at tfd.uni-hannover.de <mailto:schumacher at tfd.uni-hannover.de>>
> > wrote:
> >
> >     Hello,
> >
> >     what PBS-specific parameters do you specify for your qsub-command or
> >     in your job-file?
> >     I noticed once, that specifying "mem=" with the total amount of
> >     memory needed by the job, results in not starting jobs, because maui
> >     can't decide if it is the memory requirement of the job on one of
> >     the nodes or of all jobs together... so please tell us your used
> >     qsub-parameters...
> >
> >     Thanks
> >
> >     Sven Schumacher
> >
> >     Am 20.03.2014 19:30, schrieb hitesh chugani:
> >>     Hi Gus,
> >>
> >>
> >>     Did you create a $TORQUE/pbs_server/nodes file? *Yes*
> >>
> >>     What are the contents of that file?
> >>     *<node1> np=2
> >>     <node2> np=2*
> >>
> >>     What is the output of "pbsnodes -a"?
> >>     *<node1>
> >>     *
> >>     *     state = free
> >>          np = 2
> >>          ntype = cluster
> >>          status =
> >>     rectime=1395339913,varattr=,jobs=,state=free,netload=8159659934
> >>     <tel:8159659934
> >,gres=,loadave=0.00,ncpus=2,physmem=3848508kb,availmem=15671808kb,totmem=16300340kb,idletime=89,nusers=2,nsessions=22,sessions=2084
> >>     2619 2839 2855 2873 2877 2879 2887 2889 2916 2893 2891 3333 6665
> >>     3053 8036 25960 21736 22263 23582 26141 30680,uname=Linux lws81
> >>     2.6.18-371.4.1.el5 #1 SMP Wed Jan 8 18:42:07 EST 2014
> >>     x86_64,opsys=linux
> >>          mom_service_port = 15002
> >>          mom_manager_port = 15003
> >>
> >>     *
> >>     *<node2>
> >>     *
> >>     *     state = free
> >>          np = 2
> >>          ntype = cluster
> >>          status =
> >>     rectime=1395339913,varattr=,jobs=,state=free,netload=2817775035
> >>     <tel:2817775035
> >,gres=,loadave=0.00,ncpus=8,physmem=16265764kb,availmem=52900464kb,totmem=55259676kb,idletime=187474,nusers=3,nsessions=4,sessions=11923
> >>     17547 20030 29392,uname=Linux lws10.uncc.edu
> >>     <http://lws10.uncc.edu> 2.6.18-371.4.1.el5 #1 SMP Wed Jan 8
> >>     18:42:07 EST 2014 x86_64,opsys=linux
> >>          mom_service_port = 15002
> >>          mom_manager_port = 15003*
> >>
> >>
> >>     Did you enable scheduling in the pbs_server? *Maui is enabled*
> >>
> >>
> >>     Did you keep the --enable-cpuset configuration option? *No. I have
> >>     disabled it*
> >>
> >>
> >>     I am able to run single/two node single processor
> >>     job(nodes=1(and2):ppn=1). But when i am trying to run
> >>     multiprocessor jobs(nodes=2:ppn=2 with nodes having 2 and 8 ncpu),
> >>     the job is remaining in queue . I am able to forcefully run the
> >>     job via qrun. I am using Maui scheduler.
> >>
> >>
> >>     Please help.
> >>
> >>
> >>     Thanks,
> >>     Hitesh chugani.
> >>
> >>
> >>
> >>
> >>
> >>     On Mon, Mar 17, 2014 at 7:35 PM, Gus Correa <gus at ldeo.columbia.edu
> >>     <mailto:gus at ldeo.columbia.edu>> wrote:
> >>
> >>         Hi Hitesh
> >>
> >>         Did you create a $TORQUE/pbs_server/nodes file?
> >>         What are the contents of that file?
> >>         What is the output of "pbsnodes -a"?
> >>
> >>         Make sure the nodes file is there.
> >>         If not, create it again, and restart pbs_server.
> >>
> >>         Did you enable scheduling in the pbs_server?
> >>
> >>         Also:
> >>
> >>         Did you keep the --enable-cpuset configuration option?
> >>         If you did:
> >>         Do you have a /dev/cpuset directory on your nodes?
> >>         Do you have a type cpuset filesystem mounted on /dev/cpuset
> >>         on the nodes?
> >>
> >>         Check this link:
> >>
> >>
> http://docs.adaptivecomputing.com/torque/Content/topics/3-nodes/linuxCpusetSupport.htm
> >>
> >>         Still in the topic of cpuset:
> >>
> >>         Are you perhaps running cgroups on the nodes (the cgconfig
> >>         service)?
> >>
> >>         I hope this helps,
> >>         Gus Correa
> >>
> >>         On 03/17/2014 05:45 PM, hitesh chugani wrote:
> >>         > Hello,
> >>         >
> >>         > I have reconfigured torque to disable NUMA support. I am
> >>         able to run
> >>         > single node single processor job(nodes=1:ppn=1). But when i
> >>         am trying to
> >>         > run multiprocessor jobs(nodes=2:ppn=2 with nodes having 2
> >>         and 8 ncpu),
> >>         > the job is remaining in queue . I am able to forcefully run
> >>         the job via
> >>         > qrun. I am using Maui scheduler.  Can anyone please tell me
> >>         what may be
> >>         > the issue? is it something to do with Maui scheduler? Thanks.
> >>         >
> >>         > Regards,
> >>         > Hitesh Chugani.
> >>         >
> >>         >
> >>         > On Mon, Mar 17, 2014 at 12:40 PM, hitesh chugani
> >>         > <hiteshschugani at gmail.com <mailto:hiteshschugani at gmail.com>
> >>         <mailto:hiteshschugani at gmail.com
> >>         <mailto:hiteshschugani at gmail.com>>> wrote:
> >>         >
> >>         >     I tried nodes=X:ppn=Y option. It still didn't work . I
> >>         guess it is
> >>         >     something to deal with NUMA option enabling. I am
> >>         looking into this
> >>         >     issue and will let you guys know . Thanks a lot
> >>         >
> >>         >
> >>         >
> >>         >     On Thu, Mar 13, 2014 at 10:22 AM, Ken Nielson
> >>         >     <knielson at adaptivecomputing.com
> >>         <mailto:knielson at adaptivecomputing.com>
> >>         >     <mailto:knielson at adaptivecomputing.com
> >>         <mailto:knielson at adaptivecomputing.com>>> wrote:
> >>         >
> >>         >         Glen is right. There is a regression with procs.
> >>         >
> >>         >
> >>         >         On Wed, Mar 12, 2014 at 5:29 PM,
> >>         <glen.beane at gmail.com <mailto:glen.beane at gmail.com>
> >>         >         <mailto:glen.beane at gmail.com
> >>         <mailto:glen.beane at gmail.com>>> wrote:
> >>         >
> >>         >             I think there is a regression in Torque and
> >>         procs only works
> >>         >             with Moab now. Try nodes=X:ppn=Y
> >>         >
> >>         >
> >>         >             On Mar 12, 2014, at 6:26 PM, hitesh chugani
> >>         >             <hiteshschugani at gmail.com
> >>         <mailto:hiteshschugani at gmail.com>
> >>         <mailto:hiteshschugani at gmail.com
> >>         <mailto:hiteshschugani at gmail.com>>>
> >>         >             wrote:
> >>         >
> >>         >>             Hi all,
> >>         >>
> >>         >>
> >>         >>             I am trying to submit a job with to use
> >>         >>             multiprocessors(Added #PBS -l procs=4 in the
> >>         job script)
> >>         >>             but the job is remaining queued forever. I am
> >>         using 2
> >>         >>             computes nodes (ncpus=8 and 2). Any idea why is
> >>         it not
> >>         >>             running? Please help.
> >>         >>
> >>         >>             I have installed torque using this
> >>         configuration option.
> >>         >>             *./configure --enable-unixsockets --enable-cpuset
> >>         >>             --enable-geometry-requests --enable-numa-support
> *
> >>         >>
> >>         >>
> >>         >>
> >>         >>
> >>         >>             Thanks,
> >>         >>             Hitesh Chugani.
> >>         >>             Student Linux specialist
> >>         >>             University of North Carolina, Charlotte
> >>         >> _______________________________________________
> >>         >>
> >>         >>             torqueusers mailing list
> >>         >> torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>
> >>         >>             <mailto:torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>>
> >>         >> http://www.supercluster.org/mailman/listinfo/torqueusers
> >>         >
> >>         > _______________________________________________
> >>         >             torqueusers mailing list
> >>         > torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>
> >>         >             <mailto:torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>>
> >>         > http://www.supercluster.org/mailman/listinfo/torqueusers
> >>         >
> >>         >
> >>         >
> >>         >
> >>         >         --
> >>         >         Ken Nielson
> >>         > +1 801.717.3700 <tel:%2B1%20801.717.3700>
> >>         <tel:%2B1%20801.717.3700> office +1 801.717.3738
> >>         <tel:%2B1%20801.717.3738>
> >>         >         <tel:%2B1%20801.717.3738> fax
> >>         >         1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
> >>         > www.adaptivecomputing.com <http://www.adaptivecomputing.com>
> >>         <http://www.adaptivecomputing.com>
> >>         >
> >>         >
> >>         >         _______________________________________________
> >>         >         torqueusers mailing list
> >>         > torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>
> >>         <mailto:torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>>
> >>         > http://www.supercluster.org/mailman/listinfo/torqueusers
> >>         >
> >>         >
> >>         >
> >>         >
> >>         >
> >>         > _______________________________________________
> >>         > torqueusers mailing list
> >>         > torqueusers at supercluster.org
> >>         <mailto:torqueusers at supercluster.org>
> >>         > http://www.supercluster.org/mailman/listinfo/torqueusers
> >>         >
> >>
> >>         _______________________________________________
> >>         torqueusers mailing list
> >>         torqueusers at supercluster.org <mailto:
> torqueusers at supercluster.org>
> >>         http://www.supercluster.org/mailman/listinfo/torqueusers
> >>
> >>
> >>
> >>
> >>     _______________________________________________
> >>     torqueusers mailing list
> >>     torqueusers at supercluster.org  <mailto:torqueusers at supercluster.org>
> >>     http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >     --
> >     Sven Schumacher - Systemadministrator Tel: (0511)762-2753
> >     Leibniz Universitaet Hannover
> >     Institut für Turbomaschinen und Fluid-Dynamik       - TFD
> >     Appelstraße 9 - 30167 Hannover
> >     Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
> >     Callinstraße 36 - 30167 Hannover
> >
> >
> >     _______________________________________________
> >     torqueusers mailing list
> >     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
> >     http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140321/de90dc37/attachment-0001.html 


More information about the torqueusers mailing list