[Mauiusers] all jobs are killed when one job crashes
Kevin Van Workum
vanw at tticluster.com
Wed Nov 24 10:12:21 MST 2004
I have a situation where all the running and queued jobs are being
killed and removed from the queue when one of my user's jobs crashes.
I'm using torque-1.1.0p4 and maui-3.2.6p9. When maui kills the running
jobs, it says for example:
ERROR: job '7930' has NULL WCLimit field
MRMJobCancel(7930,MOAB_INFO: job exceeded wallclock limit,SC)
MPBSJobCancel(7930,base,CMsg,Msg,MOAB_INFO: job exceeded wallclock limit)
But I don't set a wallclock limit. If I use pbs_sched instead of maui,
the crashing job doesn't cause the other jobs to be killed.
I also observed that when the job that crashes is queued and started,
maui reports:
ALERT: unexpected node transition on node 'blue84' Idle -> Busy
Why would this be an unexpected transition?
My maui.cfg is simple since I don't have any real scheduling priorities
right now and is given below:
SERVERHOST tti
ADMIN1 root
ADMIN2 vanw
ADMIN3 jdvw
RMCFG[base] TYPE=PBS
RMPOLLINTERVAL 00:00:30
SERVERPORT 42559
SERVERMODE NORMAL
LOGFILE maui.log
LOGFILEMAXSIZE 10000000
LOGLEVEL 1
QUEUETIMEWEIGHT 1
BACKFILLPOLICY FIRSTFIT
RESERVATIONPOLICY CURRENTHIGHEST
NODEALLOCATIONPOLICY MINRESOURCE
More information about the mauiusers
mailing list