[Mauiusers] Job cannot be started, 15062, Unknown node

Jan Ploski Jan.Ploski at offis.de
Tue Sep 25 04:18:23 MDT 2007


Hello,

I have a job stuck in front of the queue, apparently blocking all other, 
lower-priority jobs from executing. checkjob reports the useless error 
message "Unknown node" (see subject).

I tracked down the reason of the problem to an invalid nodelist 
specification which is produced for the job by Maui. More precisely, this 
is what I request:

nodes=18:ib:ppn=4+1:ib:ppn=2

and this is what Maui gives me:

node3:ppn=4+node4:ppn=8+node5:ppn=4+node6:ppn=4+node7:ppn=4+node8:ppn=4+node9:ppn=4+node10:ppn=4+node11:ppn=4+node12:ppn=4+node13:ppn=4+node14:ppn=4+node15:ppn=4+node16:ppn=4+node18:ppn=4+node20:ppn=4+node22:ppn=4+node17:ppn=2

If you sum up the ppn, you will notice that it tries to give me 4 
processors more than requested (78 instead of 74). Morever, it tries to 
give me node4:ppn=8 - even though node4 is configured with only 4 
processors. This is why TORQUE rejects the job.

Now, I can debug the maui process and see that the TC is 8 instead of 4 in 
the job's NodeList, and I can also see that the nodes are allocated as 
expected (TC=4) in the job's reqs, but I don't know where the NodeList of 
the job comes from. I don't even know whether it is overwritten with the 
wrong value on each scheduling cycle or whether it was computed once when 
the job was created. I'd be grateful for some debugging tips from Maui 
developers.

Regards,
Jan Ploski

--
Dipl.-Inform. (FH) Jan Ploski
OFFIS
Betriebliches Informationsmanagement
Escherweg 2  - 26121 Oldenburg - Germany
Fon: +49 441 9722 - 184 Fax: +49 441 9722 - 202
E-Mail: Jan.Ploski at offis.de - URL: http://www.offis.de


More information about the mauiusers mailing list