[Mauiusers] all jobs are killed when one job crashes

Kevin Van Workum vanw at tticluster.com
Wed Nov 24 10:12:21 MST 2004


I have a situation where all the running and queued jobs are being 
killed and removed from the queue when one of my user's jobs crashes. 
I'm using torque-1.1.0p4 and maui-3.2.6p9. When maui kills the running 
jobs, it says for example:

ERROR:    job '7930' has NULL WCLimit field
MRMJobCancel(7930,MOAB_INFO:  job exceeded wallclock limit,SC)
MPBSJobCancel(7930,base,CMsg,Msg,MOAB_INFO:  job exceeded wallclock limit)

But I don't set a wallclock limit. If I use pbs_sched instead of maui, 
the crashing job doesn't cause the other jobs to be killed.

I also observed that when the job that crashes is queued and started, 
maui reports:
ALERT:    unexpected node transition on node 'blue84'  Idle -> Busy
Why would this be an unexpected transition?

My maui.cfg is simple since I don't have any real scheduling priorities 
right now and is given below:

SERVERHOST            tti
ADMIN1                root
ADMIN2          vanw
ADMIN3          jdvw
RMCFG[base]  TYPE=PBS
RMPOLLINTERVAL        00:00:30
SERVERPORT            42559
SERVERMODE            NORMAL
LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              1
QUEUETIMEWEIGHT       1
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEALLOCATIONPOLICY  MINRESOURCE





More information about the mauiusers mailing list