[torqueusers] Re: torqueusers Digest, Vol 38, Issue 24

vanilla vanilla0111 at gmail.com
Tue Sep 18 21:33:51 MDT 2007


Hi,
Still the PBS job submition and run problem,
I have installed oscar5.0 successfully ,
But, when I qsub a job, the job is always in Q state,  after few seconds,
qstat shows nothing, I can't see  middle process, and there is no output or
error logs . Actually there is no mistake in job script,

Maui is the default scheduler, the log of maui shows:
-----------------------------------------------------------------------------------
09/19 12:17:21 INFO:     2 PBS jobs detected on RM base
09/19 12:17:21 INFO:     jobs detected: 2
09/19 12:17:21 MStatClearUsage(node,Active)
09/19 12:17:21 MClusterUpdateNodeState()
09/19 12:17:21 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
09/19 12:17:21 INFO:     job '308' Priority:        1
00.0)  Res:      0(00.0)  Us:      0(00.0)
09/19 12:17:21 INFO:     job '309' Priority:        1
00.0)  Res:      0(00.0)  Us:      0(00.0)

--------------------------------------------------------------------------------
09/19 12:45:25 INFO:     node 'oscarnode1.oscardomain' returned to idle pool
09/19 12:45:25 INFO:     job '               312' completed.  QueueTime:
11  RunTime:     11  Accuracy:  0.61  X
Factor:  0.01
09/19 12:45:25 INFO:     overall statistics.  Accuracy:  0.00  XFactor:
0.00
09/19 12:45:25 INFO:     job '312' completed  X: 0.012222  T: 11  PS: 11  A:
0.006111
09/19 12:45:25 MJobSendFB(312)
09/19 12:45:25 MSysLaunchAction(ASList,2)
09/19 12:45:25 INFO:     job usage sent for job '312'
-----------------------------------------------------------------------------------

Can anyone tell me what's the problem?
Is it the problem of maui config , or something else.

Thanks for help!

On 9/19/07, torqueusers-request at supercluster.org <
torqueusers-request at supercluster.org> wrote:
>
> Send torqueusers mailing list submissions to
>         torqueusers at supercluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://www.supercluster.org/mailman/listinfo/torqueusers
> or, via email, send a message with subject or body 'help' to
>         torqueusers-request at supercluster.org
>
> You can reach the person managing the list at
>         torqueusers-owner at supercluster.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of torqueusers digest..."
>
>
> Today's Topics:
>
>    1. problems running jobs: Error:Number of meshes not equal to
>       number of threads (Nilesh Mistry)
>    2. Re: defining queues by user defined node features
>       (P Spencer Davis)
>    3. Re: defining queues by user defined node features
>       (Garrick Staples)
>    4. about multiserver (vanilla)
>    5. Re: about multiserver (Jacques Foury)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 17 Sep 2007 14:35:43 -0400
> From: Nilesh Mistry <Nilesh.Mistry at senecac.on.ca>
> Subject: [torqueusers] problems running jobs: Error:Number of meshes
>         not equal to number of threads
> To: torqueusers at supercluster.org, oscar-users at lists.sourceforge.net,
>         mauiusers at supercluster.org
> Message-ID: <46EEC8FF.5000102 at senecac.on.ca>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello
>
> I am having problems submitting job that requires 23 threads.  I keep
> getting the following error:
>
> ERROR: Number of meshes not equal to number of thread
>
> Hardware:
> 10 quad core nodes (therefore 40 processors available)
>
> What do I need to insure in my job queue (qmgr) , maui (maui.cfg) and
> my submit script when using qsub?
>
> Any and all help is greatly appreciated.
>
> --
> Thanks
>
> Nilesh Mistry
> Academic Computing Services
> Seneca at York & TEL Campus
> Seneca College Of Applies Arts & Technology
> 70 The Pond Road
> Toronto, Ontario
> M3J 3M6 Canada
> Phone 416 491 5050 ext 3788
> Fax 416 661 4695
> http://acs.senecac.on.ca
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 17 Sep 2007 15:12:57 -0400
> From: P Spencer Davis <psdavis at bsu.edu>
> Subject: Re: [torqueusers] defining queues by user defined node
>         features
> To: torqueusers at supercluster.org
> Message-ID: <46EED1B9.5020203 at bsu.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> One final problem, I had to change the queues so that they all have
> resource_min.nodes=1:x86 or resource_min.nodes=1:em64 in order have jobs
> that request more than one processor to get queued, however this means
> that qsub -l nodes=em64 will no longer work, nor will qsub -l
> nodes=n35:em64. Have I just made a mess of this, or do I need to add a
> set of serial queues as well?
>                      Spencer
>
> P Spencer Davis wrote:
> > Ok, I figured out my problem. It boils down to renaming the x86-64
> > variable in my nodes file. When it was changed to em64, with the
> > available_resource.nodes=em64 set for the short-64 and long-64 queues,
> > the jobs where being sorted into the proper queues. Then I set the
> > acl_hosts=n(n)+...+n(n+1), set acl_host_enable=false, restarted maui and
> >  torque and everything works.
> >               Hope this helps someone else,
> >               and thanks to the group for listening to me think my way
> >                out of the problem
> >                           Spencer Davis
> >
> > P Spencer Davis wrote:
> >> I tried shutting down Maui and running the default pbs_sched instead.
> >> No change in behavior.  I've set the resource_available.nodes to x86
> >> or x84-64 in the execution queues thinking that the routing queue
> >> would then route the 32 bit requests to short or long and the 64 bit
> >> jobs to short-64 or long-64 depending on the wall time requested, but
> >> that has no effect. At this point I have no idea what I am doing
> >> wrong, Any ideas?
> >>                   Thanks,
> >>                      Spencer
> >>
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 17 Sep 2007 14:27:48 -0700
> From: Garrick Staples <garrick at usc.edu>
> Subject: Re: [torqueusers] defining queues by user defined node
>         features
> To: torqueusers at supercluster.org
> Message-ID: <20070917212747.GZ19043 at polop.usc.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> On Fri, Sep 14, 2007 at 03:47:43PM -0400, P Spencer Davis alleged:
> > Hello,
> >   I'm running v 2.1.6 of PBS as a resource manager with v 3.2.6p19 of
> > the Maui scheduler. All the compute nodes are running RHEL 4 with the
> > 2.6.9-55 kernel. The cluster is heterogious, 32 of the nodes are 32 bit
> > dual processor, and the other 32 are 64 bit dual processor. The nodes
> > file in server_priv is configured as follows (edited for brevity)
> > ...
> > n31 np=2 x86
> > n32 np=2 x86-64
> > ...
>
> My advise is a completely different direction.  Don't use the arch as a
> node property.  There is already a node attribute called "arch" that you can
> use for this.
>
> If you look at 'pbsnodes -a', you'll arch=i686 and arch=x86_64 associated
> with
> the different nodes.  Then just add that arch to your resource request.
>
> In general, if you've compiled and installed software correctly, 32bit
> binaries
> run correctly on 64bit hosts.  This means that users of 32bit binaries can
> simply omit the arch because their jobs run everywhere.  Users of 64bit
> binaries add "arch=x86_64" to their request and it will only run on 64bit
> nodes.
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: not available
> Url :
> http://www.supercluster.org/pipermail/torqueusers/attachments/20070917/57c1775f/attachment-0001.bin
>
> ------------------------------
>
> Message: 4
> Date: Tue, 18 Sep 2007 11:04:54 +0800
> From: vanilla <vanilla0111 at gmail.com>
> Subject: [torqueusers] about multiserver
> To: torqueusers at supercluster.org
> Message-ID:
>         <81dd40cd0709172004t312f277cge596a3642299321c at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I have some trouble in pbs job submission and run. I know it is because of
> multiserver, but I can't mend it.
> The cluster (oscar 5.0) has one head node and one compute node, as the
> following:
> cat /etc/hosts
> ----------------------
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1       localhost.localdomain   localhost
> 192.168.190.1   oscar_server.oscardomain oscar_server nfs_oscar pbs_oscar
> 192.168.22.107  dchen-linux.localdomain     dchen-linux
>
> # These entries are managed by SIS, please don't modify them.
> 192.168.190.2        oscarnode1.oscardomain     oscarnode1
> ---------------------------
> 1. when I config  /var/spool/pbs/torque.cfg file as the following:
> -----------------------------
>       1 QSUBSLEEP   2
>       2 SERVERHOST  dchen-linux
>       3 ALLOWCOMPUTEHOSTSUMBIT  true
> ------------------------------
> qsub is successful and I can see all jobs in qstat , but all jobs just  in
> queue, can't run.
>
> 2. when I config /var/spool/pbs/torque.cfg file in another way:
> ---------------------------------
>       1 QSUBSLEEP   2
>       2 SERVERHOST  oscar_server
>       3 ALLOWCOMPUTEHOSTSUMBIT  true
> ----------------------------------
> qsub failed.
>
> How to config and run qsub successfully?
> Thanks for help.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20070918/264cbc97/attachment-0001.html
>
> ------------------------------
>
> Message: 5
> Date: Tue, 18 Sep 2007 18:30:40 +0200
> From: Jacques Foury <Jacques.Foury at math.u-bordeaux1.fr>
> Subject: Re: [torqueusers] about multiserver
> To: torqueusers at supercluster.org
> Message-ID: <46EFFD30.6000607 at math.u-bordeaux1.fr>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> vanilla a écrit :
> > I have some trouble in pbs job submission and run. I know it is
> > because of multiserver, but I can't mend it.
> What is a "multiserver" ? Torque can only have a single server, as far
> as I know...
> > The cluster (oscar 5.0) has one head node and one compute node, as the
> > following:
> > cat /etc/hosts
> > ----------------------
> > # Do not remove the following line, or various programs
> > # that require network functionality will fail.
> > 127.0.0.1 <http://127.0.0.1>       localhost.localdomain   localhost
> > 192.168.190.1 <http://192.168.190.1>   oscar_server.oscardomain
> > oscar_server nfs_oscar pbs_oscar
> > 192.168.22.107 <http://192.168.22.107>  dchen-linux.localdomain
> > dchen-linux
> >
> > # These entries are managed by SIS, please don't modify them.
> > 192.168.190.2 <http://192.168.190.2>        oscarnode1.oscardomain
> > oscarnode1
> > ---------------------------
> > 1. when I config  /var/spool/pbs/torque.cfg file as the following:
> > -----------------------------
> >       1 QSUBSLEEP   2
> >       2 SERVERHOST  dchen-linux
> >       3 ALLOWCOMPUTEHOSTSUMBIT  true
> > ------------------------------
> > qsub is successful and I can see all jobs in qstat , but all jobs
> > just  in queue, can't run.
>
> Do you have a scheduler ? Does it run ? It is the scheduler, which
> orders the jobs to start !
> Anyway I don't know that file, maybe it's OSCAR-specific... can you run
> qmgr -c "p s" and tell us what's the Torque server ?
>
> What's the version of Torque you're using ? Recently Torque is
> prefferably in /var/lib/torque ... and the config file is only read when
> creating the database for torque. After that first start, use qmgr to
> change parameters... and stop/start the services.
> >
> > 2. when I config /var/spool/pbs/torque.cfg file in another way:
> > ---------------------------------
> >       1 QSUBSLEEP   2
> >       2 SERVERHOST  oscar_server
> >       3 ALLOWCOMPUTEHOSTSUMBIT  true
> > ----------------------------------
> > qsub failed.
> >
> > How to config and run qsub successfully?
> > Thanks for help.
>
> What you want is a submit host ?
> Just add your submit host to server's /etc/hosts.equiv and install the
> Torque client package on the submit host.
>
> --
>
> Jacques Foury
> Institut de Mathématiques de Bordeaux
> Université Bordeaux 1 / CNRS
> Tel : 05 4000 69 56
> Fax : 05 4000 21 23
> http://www.math.u-bordeaux.fr/maths/cellule
>
>
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 38, Issue 24
> *******************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070919/42a0eb8e/attachment-0001.html


More information about the torqueusers mailing list