[torqueusers] removal of "stray jobs"

Lech Nieroda nieroda.lech at uni-koeln.de
Mon Dec 10 02:28:22 MST 2012


Dear list,

we are currently running Torque 4.1.3 with Maui 3.3.1. The option 
"mom_job_sync" is on. However, we get "stray" jobs quite often - these 
are jobs that remain in an "EXITING" state for whatever reason and their 
<jobid>.JB files are often left lying around.

Our workaround: at first we've tried to delete the JB files and restart 
the pbs_mom daemon but it turns out that a simple "momctl -h <host> -c 
<jobid>" does the job as well. An appropriate script runs now daily with 
cron and removes such jobs.

So, when the server discovers a "stray job" he has the means to send a 
"cleaning" command to the pbs_mom but apparently doesn't do it and we 
have to do it manually.

Any option to fix that? Is it a bug?

Regards,
Lech Nieroda

-- 
Dipl.-Wirt.-Inf. Lech Nieroda
Regionales Rechenzentrum der Universität zu Köln (RRZK)
Universität zu Köln
Weyertal 121
Raum 309 (3. Etage)
D-50931 Köln
Deutschland

Tel.: +49 (221) 470-89606
E-Mail: nieroda.lech at uni-koeln.de


More information about the torqueusers mailing list