[Mauiusers] Multiple job request peculiarities

Peter Crosta pmc2107 at columbia.edu
Mon Mar 28 07:37:38 MDT 2011


Marvin,

 

We use Maui 3.3 and Torque 2.5.4, and our maui config looks like yours
(except we have NODEALLOCATIONPOLICY set to PRIORITY).

 

Your first qsub asks for 12 processors on one node and 1 processor on one
node, so 2 nodes in total and 13 processors. Your second asks for 12
processors on each of 3 nodes (36 total) and one processor on one node, so 4
nodes and 37 processors. How many nodes and processors do you have according
to showq?

 

You also noted that $ qsub -l nodes=4:ppn=12+1:ppn=1 worked, which I find
strange as this requires 49 processors and 5 nodes. Any other processor or
node restrictions in your torque or maui config?

 

Peter





From: Marvin Novaglobal [mailto:marvin.novaglobal at gmail.com] 
Sent: Thursday, March 24, 2011 10:56 PM
To: Peter Michael Crosta
Cc: mauiusers at supercluster.org
Subject: Re: [Mauiusers] Multiple job request peculiarities

 

Sorry, I just had a look at my original post again. The description missed a
'+' sign there but in my actual testing I have a '+' sign. Therefore, 

qsub -l nodes=1:ppn=12+1:ppn=1 (works)

while

qsub -l nodes=3:ppn=12+1:ppn=1 (does not work, job goes to idle)

Weird stuff. May I know if you guys encounter this?

 

 

Regards,
Marvin

 

 

On Fri, Mar 25, 2011 at 10:46 AM, Marvin Novaglobal
<marvin.novaglobal at gmail.com> wrote:

Hi Peter,

    It doesn't work for my setup. I meant it only applies to nodes=3 and
nodes=5 so far. We don't have enough resources to test on nodes=7. So again,

qsub -l nodes=1:ppn=12+1:ppn=1 will work but

qsub -l nodes=3:ppn=12+1:ppn=1 will not work

    May I know which version of Maui and Torque you are using? Your Maui and
Torque's config also please.

 

 


Regards,

Marvin

 

 

On Fri, Mar 25, 2011 at 12:20 AM, Peter Michael Crosta
<pmc2107 at columbia.edu> wrote:

Hi Marvin,

I have gotten multiple resource requests to work by using the "+" sign. Have
you tried

qsub -l nodes=3:ppn=12+1:ppn=1 ?

Best,
Peter



On Thu, 24 Mar 2011, Marvin Novaglobal wrote:

Hi,    On my setup,
$ qsub -l nodes=1:ppn=12:1:ppn=1 (works)
$ qsub -l nodes=2:ppn=12:1:ppn=1 (works)
$ qsub -l nodes=3:ppn=12:1:ppn=1 (job goes to idle and never get executed)
$ qsub -l nodes=4:ppn=12:1:ppn=1 (works)
$ qsub -l nodes=5:ppn=12:1:ppn=1 (job goes to idle and never get executed)

<Maui.cfg>
...
ENABLEMULTINODEJOBS[0]            TRUE
ENABLEMULTIREQJOBS[0]              TRUE
JOBNODEMATCHPOLICY[0]             EXACTNODE
NODEALLOCATIONPOLICY[0]           MINRESOURCE


<Torque.cfg>
set server scheduling = True
set server acl_hosts = aquarius.local
set server managers = torque at aquarius
set server operators = torque at aquarius
set server default_queue = DEFAULT
set server log_events = 511
set server mail_from = adm
set server resources_available.nodect = 2048
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 6
set server mom_job_sync = True
set server keep_completed = 300
set server next_job_number = 377

<maui.log>
03/24 20:23:48 MResDestroy(377)
03/24 20:23:48 MResChargeAllocation(377,2)
03/24 20:23:48
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,EVERY,FReason,TRUE)
03/24 20:23:48 INFO:     total jobs selected in partition ALL: 1/1
03/24 20:23:48
MQueueSelectJobs(SrcQ,DstQ,SOFT,5120,4096,2140000000,DEFAULT,FReason,TRUE)
03/24 20:23:48 INFO:     total jobs selected in partition DEFAULT: 1/1
03/24 20:23:48 MQueueScheduleIJobs(Q,DEFAULT)
03/24 20:23:48 INFO:     72 feasible tasks found for job 377:0 in partition
DEFAULT (36 Needed)
03/24 20:23:48 INFO:     72 feasible tasks found for job 377:1 in partition
DEFAULT (1 Needed)
03/24 20:23:48 ALERT:    inadequate tasks to allocate to job 377:1 (0 < 1)
03/24 20:23:48 ERROR:    cannot allocate nodes to job '377' in partition
DEFAULT
03/24 20:23:48 MJobPReserve(377,DEFAULT,ResCount,ResCountRej)
03/24 20:23:48 MJobReserve(377,Priority)
03/24 20:23:48 INFO:     72 feasible tasks found for job 377:0 in partition
DEFAULT (36 Needed)
03/24 20:23:48 INFO:     72 feasible tasks found for job 377:1 in partition
DEFAULT (1 Needed)
03/24 20:23:48 INFO:     72 feasible tasks found for job 377:0 in partition
DEFAULT (36 Needed)
03/24 20:23:48 INFO:     72 feasible tasks found for job 377:1 in partition
DEFAULT (1 Needed)
03/24 20:23:48 INFO:     located resources for 36 tasks (144) in best
partition DEFAULT for job 377 at time 00:00:01
03/24 20:23:48 INFO:     tasks located for job 377:  37 of 36 required (144
feasible)
03/24 20:23:48 MResJCreate(377,MNodeList,00:00:01,Priority,Res)
03/24 20:23:48 INFO:     job '377' reserved 36 tasks (partition DEFAULT) to
start in 00:00:01 on Thu Mar 24 20:23:49
 (WC: 2592000)

<pbs_server.log>
03/24/2011 20:23:17;0100;PBS_Server;Job;377.aquarius;enqueuing into DEFAULT,
state 1 hop 1
03/24/2011 20:23:17;0008;PBS_Server;Job;377.aquarius;Job Queued at request
of torque at aquarius, owner = torque at aquarius, job name = parallel.sh, queue =
DEFAULT
03/24/2011 20:23:17;0040;PBS_Server;Svr;aquarius;Scheduler was sent the
command new


Anyone encounter problem with multiple job requests? 


Regards,
Marvin



 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20110328/1f74f73b/attachment.html 


More information about the mauiusers mailing list