[Mauiusers] Maui/Torque stopped running jobs
Kevin Hildebrand
kevin at umd.edu
Fri Oct 26 14:50:37 MDT 2007
Hello, at some point today, my Maui/Torque installation stopped running
jobs. It appears that Maui is able to select an available set of nodes,
but then can't seem to start the job. I'm not getting any errors on the
Torque side, or in fact, I'm not even seeing Torque log entries that the
job is even being started. Here's what I'm seeing in the Maui logs:
10/26 16:43:25 INFO: tasks located for job 21542: 2 of 2 required (36
feasible)
10/26 16:43:25 INFO: allocated MNode[000]x2
'compute-2-1.deepthought.umd.edu' to 21542:0
10/26 16:43:25 MJobStart(21542)
10/26 16:43:25
MJobDistributeTasks(21542,DEEPTHOUGHT.UMD.EDU,NodeList,TaskMap)
10/26 16:43:25 INFO: 1 node(s)/2 task(s) added to 21542:0
10/26 16:43:25 INFO: MNode[000] 'compute-2-1.deepthought.umd.edu'(x2)
added to job '21542'
[020] compute-2-1.deepthought.umd.edu: (P:4,S:5405,M:3946,D:1)
[Idle][linux][[NONE]]<0.020000> C:[debug 4:4][narrow-med 4:4][narrow-long
4:4][narrow-extended 4:4][med-exten
ded 4:4][wide-debug 4:4][wide-short 4:4][wide-med 4:4][serial 4:4][grid
4:4][dev 4:4][DEFAULT] [noib][prod][dell1950] [debug 4:4][narrow-med
4:4][narrow-long 4:4][narrow-ex
tended 4:4][med-extended 4:4][wide-debug 4:4][wide-short 4:4][wide-med
4:4][serial 4:4][grid 4:4][dev 4:4]
10/26 16:43:25 INFO: end of list reached. 1 nodes found
10/26 16:43:25 INFO: tasks distributed: 2 (Round-Robin)
10/26 16:43:25 MAMAllocJReserve(21542,RIndex,ErrMsg)
10/26 16:43:25 MRMJobStart(21542,Msg,SC)
10/26 16:43:25 INFO: cannot start job 21542 (cannot start job - fail
iteration)
10/26 16:43:25 WARNING: cannot start job '21542' through resource manager
10/26 16:43:25 ERROR: MBFFirstFit: cannot start job 21542.0
Anybody have a clue as to what's going on? (I've tried restarting both
Torque and Maui, and the problem continues)
Thanks!
Kevin Hildebrand
University of Maryland, College Park
More information about the mauiusers
mailing list