[torqueusers] Updated: killbaduser, a tool to clean up rogue user processes

Åke Sandgren ake.sandgren at hpc2n.umu.se
Wed Nov 1 13:14:36 MST 2006


On Wed, 2006-11-01 at 20:55 +0100, Ole Holm Nielsen wrote:
> Garrick Staples <garrick at clusterresources.com> wrote:
> >> We've been using killbaduser, a tool to clean up rogue user processes,
> >> > for a while now and it seems to do the job well.  I've made some
> >> > minor improvements to the bash script "killbaduser" version 1.3
> >> > (attached file, or available from ftp://ftp.fysik.dtu.dk/pub/PBS/).
> >> > 
> >> > This script should be executed on each individual Torque compute node,
> >> > either from a cron job, perhaps in the job prologue script (?), or from
> >> > the master server in a loop over all compute nodes.
> > 
> > 
> > Why does it ask the server instead of just checking the local job files?
> > That would seem much faster.
> 
> Good suggestion.  I've looked at that now, but it doesn't seem feasible offhand.
> The Torque job spool files are, for example on a node:
> 
> # ls -1 /var/spool/torque/mom_priv/jobs/*.JB
> /var/spool/torque/mom_priv/jobs/7471.audhum.JB
> /var/spool/torque/mom_priv/jobs/7472.audhum.JB
> 
> So this node runs jobs 7471 and 7472, but qstat doesn't understand such
> numeric job-IDs:

printjob /var/spool/torque/mom_priv/jobs/7471.audhum.JB




More information about the torqueusers mailing list