Bugzilla – Bug 174
pbs_mom kills running jobs despite -p flag
Last modified: 2012-03-08 09:49:36 MST
You need to
before you can comment on or make changes to this bug.
I want to restart a pbs_mom on a node where it has died for whatever reason
without killing the jobs that are still running on the node. We used to be
able to do this by starting the pbs_mom with the -p flag, but apparently this
is not working anymore: everytime I start the mom using "pbs_mom -p" all
running jobs get killed. My feeling is that -p stopped working when we started
to use cpusets (I am not absolutely sure about this since we also upgraded
torque versions since then). We are currently running torque-2.5.10.
When I replace the line 214 in cpuset.c
if (cpuset_delete(pdirent->d_name) == 0)
with "if (0)" then the jobs do not get killed when I restart pbs_mom.
(In reply to comment #1)
> When I replace the line 214 in cpuset.c
> if (cpuset_delete(pdirent->d_name) == 0)
> with "if (0)" then the jobs do not get killed when I restart pbs_mom.
I have added this to the AC internal ticketing system so we can get it fixed.
I can confirm we experience the same issue since switching cpuset support on.
Currently we run 2.5.10 and the problem persists
Good thing is that CPUsets should ease process tracking is such case since all
child processess spawned by given job are available in:
Looks like this has been fixed in SVN with commits 5855 and 5856.
The commit doesn't reference this BZ number unfortunately.
Fixed in 2.5.11 revision 5855