[torquedev] [Bug 73] New: Reported By Stuart Barkley

bugzilla-daemon at supercluster.org bugzilla-daemon at supercluster.org
Thu Jul 22 15:02:08 MDT 2010


           Summary: Reported By Stuart Barkley
           Product: TORQUE
           Version: 2.5.x
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: critical
          Priority: P5
         Component: pbs_server
        AssignedTo: glen.beane at gmail.com
        ReportedBy: dbeer at adaptivecomputing.com
                CC: torquedev at supercluster.org
   Estimated Hours: 0.0

Problem 1: pbs_server crash:

For a while I was seeing pbs_server crash each time moab was
restarted.  I was playing with moab REMAPCLASS and REMAPCLASSLIST
configurations.  With REMAPCLASS disabled pbs_server did not crash.

guess: I had something queued which was being remapped upon moab
restart which would crash pbs_server (the jobs do not get remapped).
After restarting pbs_server things where okay and new jobs where
remapped correctly.

Additional note: There was a large array job with both running (~1500)
and queued (~1000) tasks.  There may have been some confusion when the
queued tasks where attempted to be remapped.

Some more notes: I've just seen another instance of this problem.  If
I submit several jobs quickly which need to be remapped pbs_server
will die.  If there is only a single job needing to be remapped when
moab restarts pbs_server does not die and the remapping happens.

It looks like pbs_server dies if multiple remaps happen either two
quickly or simultaneously.

Queuing a single array job does not crash pbs_server.  I see the
individual tasks get remapped over time.

Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

More information about the torquedev mailing list