[torqueusers] removal of "stray jobs"
nieroda.lech at uni-koeln.de
Mon Dec 10 02:28:22 MST 2012
we are currently running Torque 4.1.3 with Maui 3.3.1. The option
"mom_job_sync" is on. However, we get "stray" jobs quite often - these
are jobs that remain in an "EXITING" state for whatever reason and their
<jobid>.JB files are often left lying around.
Our workaround: at first we've tried to delete the JB files and restart
the pbs_mom daemon but it turns out that a simple "momctl -h <host> -c
<jobid>" does the job as well. An appropriate script runs now daily with
cron and removes such jobs.
So, when the server discovers a "stray job" he has the means to send a
"cleaning" command to the pbs_mom but apparently doesn't do it and we
have to do it manually.
Any option to fix that? Is it a bug?
Dipl.-Wirt.-Inf. Lech Nieroda
Regionales Rechenzentrum der Universität zu Köln (RRZK)
Universität zu Köln
Raum 309 (3. Etage)
Tel.: +49 (221) 470-89606
E-Mail: nieroda.lech at uni-koeln.de
More information about the torqueusers