[torqueusers] Submitting jobs to use multiprocessors.

Sven Schumacher schumacher at tfd.uni-hannover.de
Thu Mar 20 12:45:09 MDT 2014


Hello,

what PBS-specific parameters do you specify for your qsub-command or in 
your job-file?
I noticed once, that specifying "mem=" with the total amount of memory 
needed by the job, results in not starting jobs, because maui can't 
decide if it is the memory requirement of the job on one of the nodes or 
of all jobs together... so please tell us your used qsub-parameters...

Thanks

Sven Schumacher

Am 20.03.2014 19:30, schrieb hitesh chugani:
> Hi Gus,
>
>
> Did you create a $TORQUE/pbs_server/nodes file? *Yes*
>
> What are the contents of that file?
> *<node1> np=2
> <node2> np=2*
>
> What is the output of "pbsnodes -a"?
> *<node1>
> *
> *     state = free
>      np = 2
>      ntype = cluster
>      status = 
> rectime=1395339913,varattr=,jobs=,state=free,netload=8159659934,gres=,loadave=0.00,ncpus=2,physmem=3848508kb,availmem=15671808kb,totmem=16300340kb,idletime=89,nusers=2,nsessions=22,sessions=2084 
> 2619 2839 2855 2873 2877 2879 2887 2889 2916 2893 2891 3333 6665 3053 
> 8036 25960 21736 22263 23582 26141 30680,uname=Linux lws81 
> 2.6.18-371.4.1.el5 #1 SMP Wed Jan 8 18:42:07 EST 2014 x86_64,opsys=linux
>      mom_service_port = 15002
>      mom_manager_port = 15003
>
> *
> *<node2>
> *
> *     state = free
>      np = 2
>      ntype = cluster
>      status = 
> rectime=1395339913,varattr=,jobs=,state=free,netload=2817775035,gres=,loadave=0.00,ncpus=8,physmem=16265764kb,availmem=52900464kb,totmem=55259676kb,idletime=187474,nusers=3,nsessions=4,sessions=11923 
> 17547 20030 29392,uname=Linux lws10.uncc.edu <http://lws10.uncc.edu> 
> 2.6.18-371.4.1.el5 #1 SMP Wed Jan 8 18:42:07 EST 2014 x86_64,opsys=linux
>      mom_service_port = 15002
>      mom_manager_port = 15003*
>
>
> Did you enable scheduling in the pbs_server? *Maui is enabled*
>
>
> Did you keep the --enable-cpuset configuration option? *No. I have 
> disabled it*
>
>
> I am able to run single/two node single processor 
> job(nodes=1(and2):ppn=1). But when i am trying to run multiprocessor 
> jobs(nodes=2:ppn=2 with nodes having 2 and 8 ncpu), the job is 
> remaining in queue . I am able to forcefully run the job via qrun. I 
> am using Maui scheduler.
>
>
> Please help.
>
>
> Thanks,
> Hitesh chugani.
>
>
>
>
>
> On Mon, Mar 17, 2014 at 7:35 PM, Gus Correa <gus at ldeo.columbia.edu 
> <mailto:gus at ldeo.columbia.edu>> wrote:
>
>     Hi Hitesh
>
>     Did you create a $TORQUE/pbs_server/nodes file?
>     What are the contents of that file?
>     What is the output of "pbsnodes -a"?
>
>     Make sure the nodes file is there.
>     If not, create it again, and restart pbs_server.
>
>     Did you enable scheduling in the pbs_server?
>
>     Also:
>
>     Did you keep the --enable-cpuset configuration option?
>     If you did:
>     Do you have a /dev/cpuset directory on your nodes?
>     Do you have a type cpuset filesystem mounted on /dev/cpuset
>     on the nodes?
>
>     Check this link:
>
>     http://docs.adaptivecomputing.com/torque/Content/topics/3-nodes/linuxCpusetSupport.htm
>
>     Still in the topic of cpuset:
>
>     Are you perhaps running cgroups on the nodes (the cgconfig service)?
>
>     I hope this helps,
>     Gus Correa
>
>     On 03/17/2014 05:45 PM, hitesh chugani wrote:
>     > Hello,
>     >
>     > I have reconfigured torque to disable NUMA support. I am able to run
>     > single node single processor job(nodes=1:ppn=1). But when i am
>     trying to
>     > run multiprocessor jobs(nodes=2:ppn=2 with nodes having 2 and 8
>     ncpu),
>     > the job is remaining in queue . I am able to forcefully run the
>     job via
>     > qrun. I am using Maui scheduler.  Can anyone please tell me what
>     may be
>     > the issue? is it something to do with Maui scheduler? Thanks.
>     >
>     > Regards,
>     > Hitesh Chugani.
>     >
>     >
>     > On Mon, Mar 17, 2014 at 12:40 PM, hitesh chugani
>     > <hiteshschugani at gmail.com <mailto:hiteshschugani at gmail.com>
>     <mailto:hiteshschugani at gmail.com
>     <mailto:hiteshschugani at gmail.com>>> wrote:
>     >
>     >     I tried nodes=X:ppn=Y option. It still didn't work . I guess
>     it is
>     >     something to deal with NUMA option enabling. I am looking
>     into this
>     >     issue and will let you guys know . Thanks a lot
>     >
>     >
>     >
>     >     On Thu, Mar 13, 2014 at 10:22 AM, Ken Nielson
>     >     <knielson at adaptivecomputing.com
>     <mailto:knielson at adaptivecomputing.com>
>     >     <mailto:knielson at adaptivecomputing.com
>     <mailto:knielson at adaptivecomputing.com>>> wrote:
>     >
>     >         Glen is right. There is a regression with procs.
>     >
>     >
>     >         On Wed, Mar 12, 2014 at 5:29 PM, <glen.beane at gmail.com
>     <mailto:glen.beane at gmail.com>
>     >         <mailto:glen.beane at gmail.com
>     <mailto:glen.beane at gmail.com>>> wrote:
>     >
>     >             I think there is a regression in Torque and procs
>     only works
>     >             with Moab now. Try nodes=X:ppn=Y
>     >
>     >
>     >             On Mar 12, 2014, at 6:26 PM, hitesh chugani
>     >             <hiteshschugani at gmail.com
>     <mailto:hiteshschugani at gmail.com> <mailto:hiteshschugani at gmail.com
>     <mailto:hiteshschugani at gmail.com>>>
>     >             wrote:
>     >
>     >>             Hi all,
>     >>
>     >>
>     >>             I am trying to submit a job with to use
>     >>             multiprocessors(Added #PBS -l procs=4 in the job
>     script)
>     >>             but the job is remaining queued forever. I am using 2
>     >>             computes nodes (ncpus=8 and 2). Any idea why is it not
>     >>             running? Please help.
>     >>
>     >>             I have installed torque using this configuration
>     option.
>     >>             *./configure --enable-unixsockets --enable-cpuset
>     >>             --enable-geometry-requests --enable-numa-support *
>     >>
>     >>
>     >>
>     >>
>     >>             Thanks,
>     >>             Hitesh Chugani.
>     >>             Student Linux specialist
>     >>             University of North Carolina, Charlotte
>     >> _______________________________________________
>     >>
>     >>             torqueusers mailing list
>     >> torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     >>             <mailto:torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>>
>     >> http://www.supercluster.org/mailman/listinfo/torqueusers
>     >
>     > _______________________________________________
>     >             torqueusers mailing list
>     > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     >             <mailto:torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>>
>     > http://www.supercluster.org/mailman/listinfo/torqueusers
>     >
>     >
>     >
>     >
>     >         --
>     >         Ken Nielson
>     > +1 801.717.3700 <tel:%2B1%20801.717.3700>
>     <tel:%2B1%20801.717.3700> office +1 801.717.3738
>     <tel:%2B1%20801.717.3738>
>     >         <tel:%2B1%20801.717.3738> fax
>     >         1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
>     > www.adaptivecomputing.com <http://www.adaptivecomputing.com>
>     <http://www.adaptivecomputing.com>
>     >
>     >
>     >         _______________________________________________
>     >         torqueusers mailing list
>     > torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>
>     <mailto:torqueusers at supercluster.org
>     <mailto:torqueusers at supercluster.org>>
>     > http://www.supercluster.org/mailman/listinfo/torqueusers
>     >
>     >
>     >
>     >
>     >
>     > _______________________________________________
>     > torqueusers mailing list
>     > torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     > http://www.supercluster.org/mailman/listinfo/torqueusers
>     >
>
>     _______________________________________________
>     torqueusers mailing list
>     torqueusers at supercluster.org <mailto:torqueusers at supercluster.org>
>     http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers


-- 
Sven Schumacher - Systemadministrator Tel: (0511)762-2753
Leibniz Universitaet Hannover
Institut für Turbomaschinen und Fluid-Dynamik       - TFD
Appelstraße 9 - 30167 Hannover
Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
Callinstraße 36 - 30167 Hannover

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140320/b453c963/attachment-0001.html 


More information about the torqueusers mailing list