[torqueusers] Re: torqueusers Digest, Vol 38, Issue 24
vanilla
vanilla0111 at gmail.com
Tue Sep 18 21:33:51 MDT 2007
Hi,
Still the PBS job submition and run problem,
I have installed oscar5.0 successfully ,
But, when I qsub a job, the job is always in Q state, after few seconds,
qstat shows nothing, I can't see middle process, and there is no output or
error logs . Actually there is no mistake in job script,
Maui is the default scheduler, the log of maui shows:
-----------------------------------------------------------------------------------
09/19 12:17:21 INFO: 2 PBS jobs detected on RM base
09/19 12:17:21 INFO: jobs detected: 2
09/19 12:17:21 MStatClearUsage(node,Active)
09/19 12:17:21 MClusterUpdateNodeState()
09/19 12:17:21 MQueueSelectAllJobs(Q,HARD,ALL,JIList,DP,Msg)
09/19 12:17:21 INFO: job '308' Priority: 1
00.0) Res: 0(00.0) Us: 0(00.0)
09/19 12:17:21 INFO: job '309' Priority: 1
00.0) Res: 0(00.0) Us: 0(00.0)
--------------------------------------------------------------------------------
09/19 12:45:25 INFO: node 'oscarnode1.oscardomain' returned to idle pool
09/19 12:45:25 INFO: job ' 312' completed. QueueTime:
11 RunTime: 11 Accuracy: 0.61 X
Factor: 0.01
09/19 12:45:25 INFO: overall statistics. Accuracy: 0.00 XFactor:
0.00
09/19 12:45:25 INFO: job '312' completed X: 0.012222 T: 11 PS: 11 A:
0.006111
09/19 12:45:25 MJobSendFB(312)
09/19 12:45:25 MSysLaunchAction(ASList,2)
09/19 12:45:25 INFO: job usage sent for job '312'
-----------------------------------------------------------------------------------
Can anyone tell me what's the problem?
Is it the problem of maui config , or something else.
Thanks for help!
On 9/19/07, torqueusers-request at supercluster.org <
torqueusers-request at supercluster.org> wrote:
>
> Send torqueusers mailing list submissions to
> torqueusers at supercluster.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.supercluster.org/mailman/listinfo/torqueusers
> or, via email, send a message with subject or body 'help' to
> torqueusers-request at supercluster.org
>
> You can reach the person managing the list at
> torqueusers-owner at supercluster.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of torqueusers digest..."
>
>
> Today's Topics:
>
> 1. problems running jobs: Error:Number of meshes not equal to
> number of threads (Nilesh Mistry)
> 2. Re: defining queues by user defined node features
> (P Spencer Davis)
> 3. Re: defining queues by user defined node features
> (Garrick Staples)
> 4. about multiserver (vanilla)
> 5. Re: about multiserver (Jacques Foury)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 17 Sep 2007 14:35:43 -0400
> From: Nilesh Mistry <Nilesh.Mistry at senecac.on.ca>
> Subject: [torqueusers] problems running jobs: Error:Number of meshes
> not equal to number of threads
> To: torqueusers at supercluster.org, oscar-users at lists.sourceforge.net,
> mauiusers at supercluster.org
> Message-ID: <46EEC8FF.5000102 at senecac.on.ca>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hello
>
> I am having problems submitting job that requires 23 threads. I keep
> getting the following error:
>
> ERROR: Number of meshes not equal to number of thread
>
> Hardware:
> 10 quad core nodes (therefore 40 processors available)
>
> What do I need to insure in my job queue (qmgr) , maui (maui.cfg) and
> my submit script when using qsub?
>
> Any and all help is greatly appreciated.
>
> --
> Thanks
>
> Nilesh Mistry
> Academic Computing Services
> Seneca at York & TEL Campus
> Seneca College Of Applies Arts & Technology
> 70 The Pond Road
> Toronto, Ontario
> M3J 3M6 Canada
> Phone 416 491 5050 ext 3788
> Fax 416 661 4695
> http://acs.senecac.on.ca
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 17 Sep 2007 15:12:57 -0400
> From: P Spencer Davis <psdavis at bsu.edu>
> Subject: Re: [torqueusers] defining queues by user defined node
> features
> To: torqueusers at supercluster.org
> Message-ID: <46EED1B9.5020203 at bsu.edu>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> One final problem, I had to change the queues so that they all have
> resource_min.nodes=1:x86 or resource_min.nodes=1:em64 in order have jobs
> that request more than one processor to get queued, however this means
> that qsub -l nodes=em64 will no longer work, nor will qsub -l
> nodes=n35:em64. Have I just made a mess of this, or do I need to add a
> set of serial queues as well?
> Spencer
>
> P Spencer Davis wrote:
> > Ok, I figured out my problem. It boils down to renaming the x86-64
> > variable in my nodes file. When it was changed to em64, with the
> > available_resource.nodes=em64 set for the short-64 and long-64 queues,
> > the jobs where being sorted into the proper queues. Then I set the
> > acl_hosts=n(n)+...+n(n+1), set acl_host_enable=false, restarted maui and
> > torque and everything works.
> > Hope this helps someone else,
> > and thanks to the group for listening to me think my way
> > out of the problem
> > Spencer Davis
> >
> > P Spencer Davis wrote:
> >> I tried shutting down Maui and running the default pbs_sched instead.
> >> No change in behavior. I've set the resource_available.nodes to x86
> >> or x84-64 in the execution queues thinking that the routing queue
> >> would then route the 32 bit requests to short or long and the 64 bit
> >> jobs to short-64 or long-64 depending on the wall time requested, but
> >> that has no effect. At this point I have no idea what I am doing
> >> wrong, Any ideas?
> >> Thanks,
> >> Spencer
> >>
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torqueusers
> >
> >
>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 17 Sep 2007 14:27:48 -0700
> From: Garrick Staples <garrick at usc.edu>
> Subject: Re: [torqueusers] defining queues by user defined node
> features
> To: torqueusers at supercluster.org
> Message-ID: <20070917212747.GZ19043 at polop.usc.edu>
> Content-Type: text/plain; charset="us-ascii"
>
> On Fri, Sep 14, 2007 at 03:47:43PM -0400, P Spencer Davis alleged:
> > Hello,
> > I'm running v 2.1.6 of PBS as a resource manager with v 3.2.6p19 of
> > the Maui scheduler. All the compute nodes are running RHEL 4 with the
> > 2.6.9-55 kernel. The cluster is heterogious, 32 of the nodes are 32 bit
> > dual processor, and the other 32 are 64 bit dual processor. The nodes
> > file in server_priv is configured as follows (edited for brevity)
> > ...
> > n31 np=2 x86
> > n32 np=2 x86-64
> > ...
>
> My advise is a completely different direction. Don't use the arch as a
> node property. There is already a node attribute called "arch" that you can
> use for this.
>
> If you look at 'pbsnodes -a', you'll arch=i686 and arch=x86_64 associated
> with
> the different nodes. Then just add that arch to your resource request.
>
> In general, if you've compiled and installed software correctly, 32bit
> binaries
> run correctly on 64bit hosts. This means that users of 32bit binaries can
> simply omit the arch because their jobs run everywhere. Users of 64bit
> binaries add "arch=x86_64" to their request and it will only run on 64bit
> nodes.
>
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 189 bytes
> Desc: not available
> Url :
> http://www.supercluster.org/pipermail/torqueusers/attachments/20070917/57c1775f/attachment-0001.bin
>
> ------------------------------
>
> Message: 4
> Date: Tue, 18 Sep 2007 11:04:54 +0800
> From: vanilla <vanilla0111 at gmail.com>
> Subject: [torqueusers] about multiserver
> To: torqueusers at supercluster.org
> Message-ID:
> <81dd40cd0709172004t312f277cge596a3642299321c at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> I have some trouble in pbs job submission and run. I know it is because of
> multiserver, but I can't mend it.
> The cluster (oscar 5.0) has one head node and one compute node, as the
> following:
> cat /etc/hosts
> ----------------------
> # Do not remove the following line, or various programs
> # that require network functionality will fail.
> 127.0.0.1 localhost.localdomain localhost
> 192.168.190.1 oscar_server.oscardomain oscar_server nfs_oscar pbs_oscar
> 192.168.22.107 dchen-linux.localdomain dchen-linux
>
> # These entries are managed by SIS, please don't modify them.
> 192.168.190.2 oscarnode1.oscardomain oscarnode1
> ---------------------------
> 1. when I config /var/spool/pbs/torque.cfg file as the following:
> -----------------------------
> 1 QSUBSLEEP 2
> 2 SERVERHOST dchen-linux
> 3 ALLOWCOMPUTEHOSTSUMBIT true
> ------------------------------
> qsub is successful and I can see all jobs in qstat , but all jobs just in
> queue, can't run.
>
> 2. when I config /var/spool/pbs/torque.cfg file in another way:
> ---------------------------------
> 1 QSUBSLEEP 2
> 2 SERVERHOST oscar_server
> 3 ALLOWCOMPUTEHOSTSUMBIT true
> ----------------------------------
> qsub failed.
>
> How to config and run qsub successfully?
> Thanks for help.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> http://www.supercluster.org/pipermail/torqueusers/attachments/20070918/264cbc97/attachment-0001.html
>
> ------------------------------
>
> Message: 5
> Date: Tue, 18 Sep 2007 18:30:40 +0200
> From: Jacques Foury <Jacques.Foury at math.u-bordeaux1.fr>
> Subject: Re: [torqueusers] about multiserver
> To: torqueusers at supercluster.org
> Message-ID: <46EFFD30.6000607 at math.u-bordeaux1.fr>
> Content-Type: text/plain; charset=ISO-8859-15; format=flowed
>
> vanilla a écrit :
> > I have some trouble in pbs job submission and run. I know it is
> > because of multiserver, but I can't mend it.
> What is a "multiserver" ? Torque can only have a single server, as far
> as I know...
> > The cluster (oscar 5.0) has one head node and one compute node, as the
> > following:
> > cat /etc/hosts
> > ----------------------
> > # Do not remove the following line, or various programs
> > # that require network functionality will fail.
> > 127.0.0.1 <http://127.0.0.1> localhost.localdomain localhost
> > 192.168.190.1 <http://192.168.190.1> oscar_server.oscardomain
> > oscar_server nfs_oscar pbs_oscar
> > 192.168.22.107 <http://192.168.22.107> dchen-linux.localdomain
> > dchen-linux
> >
> > # These entries are managed by SIS, please don't modify them.
> > 192.168.190.2 <http://192.168.190.2> oscarnode1.oscardomain
> > oscarnode1
> > ---------------------------
> > 1. when I config /var/spool/pbs/torque.cfg file as the following:
> > -----------------------------
> > 1 QSUBSLEEP 2
> > 2 SERVERHOST dchen-linux
> > 3 ALLOWCOMPUTEHOSTSUMBIT true
> > ------------------------------
> > qsub is successful and I can see all jobs in qstat , but all jobs
> > just in queue, can't run.
>
> Do you have a scheduler ? Does it run ? It is the scheduler, which
> orders the jobs to start !
> Anyway I don't know that file, maybe it's OSCAR-specific... can you run
> qmgr -c "p s" and tell us what's the Torque server ?
>
> What's the version of Torque you're using ? Recently Torque is
> prefferably in /var/lib/torque ... and the config file is only read when
> creating the database for torque. After that first start, use qmgr to
> change parameters... and stop/start the services.
> >
> > 2. when I config /var/spool/pbs/torque.cfg file in another way:
> > ---------------------------------
> > 1 QSUBSLEEP 2
> > 2 SERVERHOST oscar_server
> > 3 ALLOWCOMPUTEHOSTSUMBIT true
> > ----------------------------------
> > qsub failed.
> >
> > How to config and run qsub successfully?
> > Thanks for help.
>
> What you want is a submit host ?
> Just add your submit host to server's /etc/hosts.equiv and install the
> Torque client package on the submit host.
>
> --
>
> Jacques Foury
> Institut de Mathématiques de Bordeaux
> Université Bordeaux 1 / CNRS
> Tel : 05 4000 69 56
> Fax : 05 4000 21 23
> http://www.math.u-bordeaux.fr/maths/cellule
>
>
>
> ------------------------------
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>
> End of torqueusers Digest, Vol 38, Issue 24
> *******************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20070919/42a0eb8e/attachment-0001.html
More information about the torqueusers
mailing list