[torqueusers] Submitting jobs to use multiprocessors.

hitesh chugani hiteshschugani at gmail.com
Thu Mar 20 12:49:47 MDT 2014


Hi Sven,

These are the parameters in the job file

#!/bin/bash
#PBS -l nodes=2:ppn=2
#PBS -k o
#PBS -m abe
#PBS -N JobName
#PBS -V
#PBS -j oe

Thanks,
Hitesh Chugani.







On Thu, Mar 20, 2014 at 2:45 PM, Sven Schumacher <
schumacher at tfd.uni-hannover.de> wrote:

>  Hello,
>
> what PBS-specific parameters do you specify for your qsub-command or in
> your job-file?
> I noticed once, that specifying "mem=" with the total amount of memory
> needed by the job, results in not starting jobs, because maui can't decide
> if it is the memory requirement of the job on one of the nodes or of all
> jobs together... so please tell us your used qsub-parameters...
>
> Thanks
>
> Sven Schumacher
>
> Am 20.03.2014 19:30, schrieb hitesh chugani:
>
>  Hi Gus,
>
>
> Did you create a $TORQUE/pbs_server/nodes file? *Yes*
>
> What are the contents of that file?
>
> *<node1> np=2 <node2> np=2*
>
> What is the output of "pbsnodes -a"?
>
> *<node1> *
>
>
>
>
>
>
>
> *     state = free      np = 2      ntype = cluster      status =
> rectime=1395339913,varattr=,jobs=,state=free,netload=8159659934
> <8159659934>,gres=,loadave=0.00,ncpus=2,physmem=3848508kb,availmem=15671808kb,totmem=16300340kb,idletime=89,nusers=2,nsessions=22,sessions=2084
> 2619 2839 2855 2873 2877 2879 2887 2889 2916 2893 2891 3333 6665 3053 8036
> 25960 21736 22263 23582 26141 30680,uname=Linux lws81 2.6.18-371.4.1.el5 #1
> SMP Wed Jan 8 18:42:07 EST 2014 x86_64,opsys=linux      mom_service_port =
> 15002      mom_manager_port = 15003 *
>
> *<node2> *
>
>
>
>
>
> *     state = free      np = 2      ntype = cluster      status =
> rectime=1395339913,varattr=,jobs=,state=free,netload=2817775035
> <2817775035>,gres=,loadave=0.00,ncpus=8,physmem=16265764kb,availmem=52900464kb,totmem=55259676kb,idletime=187474,nusers=3,nsessions=4,sessions=11923
> 17547 20030 29392,uname=Linux lws10.uncc.edu <http://lws10.uncc.edu>
> 2.6.18-371.4.1.el5 #1 SMP Wed Jan 8 18:42:07 EST 2014 x86_64,opsys=linux
>      mom_service_port = 15002      mom_manager_port = 15003*
>
>
> Did you enable scheduling in the pbs_server? *Maui is enabled*
>
>
> Did you keep the --enable-cpuset configuration option? *No. I have
> disabled it*
>
>
> I am able to run single/two node single processor
> job(nodes=1(and2):ppn=1). But when i am trying to run multiprocessor
> jobs(nodes=2:ppn=2 with nodes having 2 and 8 ncpu), the job is remaining in
> queue . I am able to forcefully run the job via qrun. I am using Maui
> scheduler.
>
>
>  Please help.
>
>
>  Thanks,
>  Hitesh chugani.
>
>
>
>
>
> On Mon, Mar 17, 2014 at 7:35 PM, Gus Correa <gus at ldeo.columbia.edu> wrote:
>
>> Hi Hitesh
>>
>> Did you create a $TORQUE/pbs_server/nodes file?
>> What are the contents of that file?
>> What is the output of "pbsnodes -a"?
>>
>> Make sure the nodes file is there.
>> If not, create it again, and restart pbs_server.
>>
>> Did you enable scheduling in the pbs_server?
>>
>> Also:
>>
>> Did you keep the --enable-cpuset configuration option?
>> If you did:
>> Do you have a /dev/cpuset directory on your nodes?
>> Do you have a type cpuset filesystem mounted on /dev/cpuset
>> on the nodes?
>>
>> Check this link:
>>
>>
>> http://docs.adaptivecomputing.com/torque/Content/topics/3-nodes/linuxCpusetSupport.htm
>>
>> Still in the topic of cpuset:
>>
>> Are you perhaps running cgroups on the nodes (the cgconfig service)?
>>
>> I hope this helps,
>> Gus Correa
>>
>> On 03/17/2014 05:45 PM, hitesh chugani wrote:
>> > Hello,
>> >
>> > I have reconfigured torque to disable NUMA support. I am able to run
>> > single node single processor job(nodes=1:ppn=1). But when i am trying to
>> > run multiprocessor jobs(nodes=2:ppn=2 with nodes having 2 and 8 ncpu),
>> > the job is remaining in queue . I am able to forcefully run the job via
>> > qrun. I am using Maui scheduler.  Can anyone please tell me what may be
>> > the issue? is it something to do with Maui scheduler? Thanks.
>> >
>> > Regards,
>> > Hitesh Chugani.
>> >
>> >
>> > On Mon, Mar 17, 2014 at 12:40 PM, hitesh chugani
>>  > <hiteshschugani at gmail.com <mailto:hiteshschugani at gmail.com>> wrote:
>> >
>> >     I tried nodes=X:ppn=Y option. It still didn't work . I guess it is
>> >     something to deal with NUMA option enabling. I am looking into this
>> >     issue and will let you guys know . Thanks a lot
>> >
>> >
>> >
>> >     On Thu, Mar 13, 2014 at 10:22 AM, Ken Nielson
>> >     <knielson at adaptivecomputing.com
>>  >     <mailto:knielson at adaptivecomputing.com>> wrote:
>> >
>> >         Glen is right. There is a regression with procs.
>> >
>> >
>> >         On Wed, Mar 12, 2014 at 5:29 PM, <glen.beane at gmail.com
>>  >         <mailto:glen.beane at gmail.com>> wrote:
>> >
>> >             I think there is a regression in Torque and procs only works
>> >             with Moab now. Try nodes=X:ppn=Y
>> >
>> >
>> >             On Mar 12, 2014, at 6:26 PM, hitesh chugani
>>  >             <hiteshschugani at gmail.com <mailto:hiteshschugani at gmail.com
>> >>
>> >             wrote:
>> >
>> >>             Hi all,
>> >>
>> >>
>> >>             I am trying to submit a job with to use
>> >>             multiprocessors(Added #PBS -l procs=4 in the job script)
>> >>             but the job is remaining queued forever. I am using 2
>> >>             computes nodes (ncpus=8 and 2). Any idea why is it not
>> >>             running? Please help.
>> >>
>> >>             I have installed torque using this configuration option.
>>  >>             *./configure --enable-unixsockets --enable-cpuset
>> >>             --enable-geometry-requests --enable-numa-support *
>> >>
>> >>
>> >>
>> >>
>>  >>             Thanks,
>> >>             Hitesh Chugani.
>> >>             Student Linux specialist
>> >>             University of North Carolina, Charlotte
>> >>             _______________________________________________
>> >>
>> >>             torqueusers mailing list
>> >>             torqueusers at supercluster.org
>>  >>             <mailto:torqueusers at supercluster.org>
>> >>             http://www.supercluster.org/mailman/listinfo/torqueusers
>> >
>> >             _______________________________________________
>> >             torqueusers mailing list
>> >             torqueusers at supercluster.org
>>  >             <mailto:torqueusers at supercluster.org>
>> >             http://www.supercluster.org/mailman/listinfo/torqueusers
>> >
>> >
>> >
>> >
>> >         --
>> >         Ken Nielson
>>  >         +1 801.717.3700 <%2B1%20801.717.3700><tel:%2B1%20801.717.3700> office +1
>> 801.717.3738
>> >         <tel:%2B1%20801.717.3738> fax
>> >         1712 S. East Bay Blvd, Suite 300  Provo, UT  84606
>>  >         www.adaptivecomputing.com <http://www.adaptivecomputing.com>
>> >
>> >
>> >         _______________________________________________
>> >         torqueusers mailing list
>> >         torqueusers at supercluster.org <mailto:
>> torqueusers at supercluster.org>
>>  >         http://www.supercluster.org/mailman/listinfo/torqueusers
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > torqueusers mailing list
>> > torqueusers at supercluster.org
>> > http://www.supercluster.org/mailman/listinfo/torqueusers
>> >
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>
>
>
> _______________________________________________
> torqueusers mailing listtorqueusers at supercluster.orghttp://www.supercluster.org/mailman/listinfo/torqueusers
>
>
>
> --
> Sven Schumacher - Systemadministrator Tel: (0511)762-2753
> Leibniz Universitaet Hannover
> Institut für Turbomaschinen und Fluid-Dynamik       - TFD
> Appelstraße 9 - 30167 Hannover
> Institut für Kraftwerkstechnik und Wärmeübertragung - IKW
> Callinstraße 36 - 30167 Hannover
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20140320/d85f7f40/attachment.html 


More information about the torqueusers mailing list