[torqueusers] Some jobs not starting with Torque 2.3.1 and Moab

Chris Samuel csamuel at vpac.org
Sat Jul 5 03:27:54 MDT 2008


Hi there,

I'm not sure if this is a Torque or Moab bug or just the result
of a change in interaction between the two, so I'm report this
to both. :-)

Torque 2.3.1 official release.

# moab --version
moab server version 5.2.3 (revision 10590)

We have a number of jobs that are not starting and are ending
up in BatchHold due to repeated failures.  They are all logging
similar information:

Message[30] cannot start job on reserved resources - job cannot be started on RM base - cannot set hostlist: cannot set job '472817.tango-m.vpac.org' attr 'Resource_List:neednodes' to 'tango048' - job may have been removed externally (rc: 15001 'Unknown Job Id')

There are no log messages on tango048 relating to this job:

[root at tango048 ~]# grep 472817 /usr/spool/PBS/mom_logs/20080*
[root at tango048 ~]# grep 472817 /var/log/messages
[root at tango048 ~]#

PBS server has the log message saying:

07/05/2008 18:08:35;0008;PBS_Server;Job;472817.tango-m.vpac.org;MOM rejected modify request, error: 15001

[root at tango-m ~]# tracejob -n 2 -q 472817

Job: 472817.tango-m.vpac.org

07/04/2008 19:54:44  S    enqueuing into run_1_month, state 1 hop 1
07/04/2008 19:54:44  S    Job Queued at request of XXXX at tango.vpac.org, owner = XXXX at tango.vpac.org, job name = box_N3000, queue = run_1_month
07/04/2008 19:54:44  A    queue=run_1_month
07/04/2008 19:54:50  S    Job Run at request of root at tango-m.vpac.org
07/04/2008 19:54:58  S    unable to run job, MOM rejected/rc=2
07/05/2008 11:11:14  S    Holds uso released at request of root at tango-m.vpac.org
07/05/2008 11:11:19  S    Job Modified at request of root at tango-m.vpac.org
07/05/2008 11:11:19  S    MOM rejected modify request, error: 15001
07/05/2008 16:26:52  S    Holds uso released at request of root at tango-m.vpac.org
07/05/2008 17:30:21  S    Holds uso released at request of root at tango-m.vpac.org

I'm wondering if the recent changes to unbreak pbs_mom
reporting non-existant jobs has changed an assumption
that Moab was making ?

At something of a loss, any ideas ?

cheers!
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency


More information about the torqueusers mailing list