Bug 73 - Reported By Stuart Barkley
: Reported By Stuart Barkley
Product: TORQUE
: 2.5.x
: PC Linux
: P5 critical
Assigned To: David Beer
  Show dependency treegraph
Reported: 2010-07-22 15:02 MDT by David Beer
Modified: 2010-07-28 20:25 MDT (History)
3 users (show)

See Also:

Patch (904 bytes, patch)
2010-07-22 15:04 MDT, David Beer
Details | Diff


You need to log in before you can comment on or make changes to this bug.

Description David Beer 2010-07-22 15:02:07 MDT
Problem 1: pbs_server crash:

For a while I was seeing pbs_server crash each time moab was
restarted.  I was playing with moab REMAPCLASS and REMAPCLASSLIST
configurations.  With REMAPCLASS disabled pbs_server did not crash.

guess: I had something queued which was being remapped upon moab
restart which would crash pbs_server (the jobs do not get remapped).
After restarting pbs_server things where okay and new jobs where
remapped correctly.

Additional note: There was a large array job with both running (~1500)
and queued (~1000) tasks.  There may have been some confusion when the
queued tasks where attempted to be remapped.

Some more notes: I've just seen another instance of this problem.  If
I submit several jobs quickly which need to be remapped pbs_server
will die.  If there is only a single job needing to be remapped when
moab restarts pbs_server does not die and the remapping happens.

It looks like pbs_server dies if multiple remaps happen either two
quickly or simultaneously.

Queuing a single array job does not crash pbs_server.  I see the
individual tasks get remapped over time.
Comment 1 David Beer 2010-07-22 15:02:57 MDT
Crash has a patch that is being reviewed by Glen and I.

Comment 2 David Beer 2010-07-22 15:04:10 MDT
Created an attachment (id=43) [details]
Comment 3 David Beer 2010-07-23 15:44:02 MDT
Fix has been checked in to 2.5
Comment 4 Glen 2010-07-28 20:25:03 MDT
(In reply to comment #3)
> Fix has been checked in to 2.5

I merged the fix into trunk as well