[torqueusers] Updated: killbaduser, a tool to clean up rogue user processes

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Oct 31 01:50:02 MST 2006


Dear Torque users,

We've been using killbaduser, a tool to clean up rogue user processes,
for a while now and it seems to do the job well.  I've made some
minor improvements to the bash script "killbaduser" version 1.3
(attached file, or available from ftp://ftp.fysik.dtu.dk/pub/PBS/).

This script should be executed on each individual Torque compute node,
either from a cron job, perhaps in the job prologue script (?), or from
the master server in a loop over all compute nodes.

-- 
Ole Holm Nielsen
Department of Physics, Technical University of Denmark
-------------- next part --------------
#!/bin/sh

#
# On a Torque/PBS compute node, list and kill any user processes not belonging to batch jobs.
#
# Usage: killbaduser [-k] [-s] [-v]
#    -k will execute the kill command 
#    -s will sleep a random number of seconds so the pbs_server doesn't get overloaded
#    -v verbose output for debugging
# Author: Ole Holm Nielsen, Department of Physics, Technical University of Denmark
# Version: 1.3
#

###  CONFIGURE:  ###
# The list of OK system user-ids:
USERLIST="root rpc rpcuser daemon ntp smmsp sshd hpsmh named dbus"
# Don't kill processes with UID < UIDMIN
UIDMIN=250

###  CONFIGURE:  ###
# Commands which we use:
PBSNODES=/usr/local/bin/pbsnodes
QSTAT=/usr/local/bin/qstat

#
# Process command options
#
DOKILL=0
DOSLEEP=0
VERBOSE=0
while getopts "ksv" options; do
	case $options in
		k ) DOKILL=1;;
		s ) DOSLEEP=1;;
		v ) VERBOSE=1;;
		* ) echo Usage: $0 "[-k] [-s] [-v]"
			exit 1;;
	esac
done

# Get the Torque nodename for this node.
# Strip the domain name (would be nice if there existed a Torque function for the current nodename)
NODENAME=`echo $HOSTNAME | awk -F. '{print $1}'`
if test ${VERBOSE} -eq 1
then
	echo This node has name: $NODENAME
fi

#
# Sleep a random number of seconds so Torque server doesn't get overloaded
# if all nodes run this script simultaneously.
#
if test ${DOSLEEP} -eq 1
then
	# Initialize /bin/bash built-in random number generator with PID
	RANDOM=$$
	MAXSLEEP=10
	INTERVAL=$(($RANDOM % $MAXSLEEP))
	if test ${VERBOSE} -eq 1
	then
		echo Sleeping $INTERVAL seconds
	fi
	sleep $INTERVAL
fi

#
# Get job list on this node and write one line for each unique job.
# Redirect stderr for pbsnodes because it complains if this node isn't part of the cluster.
#
JOBLIST=`$PBSNODES -a $NODENAME 2>&1 | grep 'jobs = ' | sed -e s/,//g -e 's/     jobs = //' -e 's/[0-9]\///g' | tr ' ' '\n' | uniq`
if test ${VERBOSE} -eq 1
then
	echo Torque job list for node $NODENAME: $JOBLIST
fi

# Get batch job user-ids and append to USERLIST
for job in $JOBLIST
do
	# Get the user-id from the Job_Owner attribute
	# (the "euser" variable seems to be unavailable on Torque compute nodes).
	EUSER=`$QSTAT -f $job | grep 'Job_Owner =' | awk '{print $3}' | awk -F@ '{print $1}'`
	if test ${VERBOSE} -eq 1
	then
		echo Job $job with user-id $EUSER
	fi
	USERLIST="$USERLIST $EUSER"
done
if test ${VERBOSE} -eq 1
then
	echo List of OK users: $USERLIST
fi

#
# Print the process list, deselecting acceptable user-ids.
#
if test ${VERBOSE} -eq 1
then
	echo List of rogue processes:
fi
PSFLAGS="--no-headers -o pid,state,uid,user,command"
ps --deselect -u "$USERLIST" $PSFLAGS

#
# Kill rogue user processes
#
if test ${DOKILL} -eq 1
then
	PIDLIST=`ps --deselect -u "$USERLIST" $PSFLAGS | awk -v UIDMIN=$UIDMIN '
	{
		PID=$1; UID=$3
		if (UID > $UIDMIN) PIDLIST = PIDLIST sprintf("%d ", PID)
	} END {
		if (length(PIDLIST) > 0) print PIDLIST
	}'`
	# Kill rogue processes, if any
	if test -n "$PIDLIST"
	then
		echo Killing rogue processes $PIDLIST
		# Troy Baer safe version: SIGCONT; sleep; SIGTERM; sleep; SIGKILL
		if test ${VERBOSE} -eq 1
		then
			echo Sending CONT signal
		fi
		kill -s CONT $PIDLIST
		sleep 1
		if test ${VERBOSE} -eq 1
		then
			echo Sending TERM signal
		fi
		kill -s TERM $PIDLIST
		sleep 5
		if test ${VERBOSE} -eq 1
		then
			echo Sending KILL signal
		fi
		kill -s KILL $PIDLIST
	fi
fi


More information about the torqueusers mailing list