[torquedev] [Bug 73] New: Reported By Stuart Barkley
bugzilla-daemon at supercluster.org
bugzilla-daemon at supercluster.org
Thu Jul 22 15:02:08 MDT 2010
http://www.clusterresources.com/bugzilla/show_bug.cgi?id=73
Summary: Reported By Stuart Barkley
Product: TORQUE
Version: 2.5.x
Platform: PC
OS/Version: Linux
Status: NEW
Severity: critical
Priority: P5
Component: pbs_server
AssignedTo: glen.beane at gmail.com
ReportedBy: dbeer at adaptivecomputing.com
CC: torquedev at supercluster.org
Estimated Hours: 0.0
Problem 1: pbs_server crash:
For a while I was seeing pbs_server crash each time moab was
restarted. I was playing with moab REMAPCLASS and REMAPCLASSLIST
configurations. With REMAPCLASS disabled pbs_server did not crash.
guess: I had something queued which was being remapped upon moab
restart which would crash pbs_server (the jobs do not get remapped).
After restarting pbs_server things where okay and new jobs where
remapped correctly.
Additional note: There was a large array job with both running (~1500)
and queued (~1000) tasks. There may have been some confusion when the
queued tasks where attempted to be remapped.
Some more notes: I've just seen another instance of this problem. If
I submit several jobs quickly which need to be remapped pbs_server
will die. If there is only a single job needing to be remapped when
moab restarts pbs_server does not die and the remapping happens.
It looks like pbs_server dies if multiple remaps happen either two
quickly or simultaneously.
Queuing a single array job does not crash pbs_server. I see the
individual tasks get remapped over time.
--
Configure bugmail: http://www.clusterresources.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
More information about the torquedev
mailing list