[torqueusers] Nodes that pbs reports are busy which are actually running a job

John S. Urban urbanjost at comcast.net
Thu Aug 12 05:22:58 MDT 2010


We use a variety of schedulers, and find that cleaning up parallel processes 
is an issue with all of them, but that
1) utilities that control the remote processes are usually available to help 
prevent the problem in the first place;
    such as blauncher with LSF, and some versions of mpiexec for PBS/TORQUE 
when using MPI; and -kill
    options on mpirun(1) commands sometimes help.
2) If you want to make sure everything is killed on the initial node when I 
job wraps up, it is usually best to start
     your job in a process group, and then kill the process group in the 
epilogue rather than going after individual
      processes.
 3) orphaned processes on remote nodes are a little tricky, but if you write 
a cron(1) entry that
        A) CAREFULLY makes sure the scheduler is running on the node and 
that you have gotten a good list
             of what jobs should be on the node. From that list, determine 
what usernames should be on the node.
             Then look thru the process table on the node and build a list 
of all usernames on the node. Make sure
              you skip any "system" userid names. If the regular username 
has processes on the node but no job on
              the node, kill the processes.
        B)  This assumes the user can only use the node via a scheduled 
process.
        C)  Only kill processes that have several minutes of wall clock 
accumulated to avoid killing something that
              just started while you were building your lists.
         D) These steps will not clean up bad processes from job X if the 
same user is using the node legitimately with a job Y;
               which is increasingly possible as newer nodes have more and 
more cores on them
         E)  It is assumed you have only given users IDs that have UIDs in a 
unique range, so that you can easily
               identify "regular" usernames from "system" names often used 
for running daemons in the background;
               that users don't have things like cron jobs or any other 
legitimate way of accessing these nodes other
               than via scheduled tasks (be they interactive or batch).
All of  "3" has been running as a utility called "shouldnotbehere" via 
cron(1) for years, and I spend no time manually
cleaning up errant processes; but I know of other sites that do.
Be very careful your "shouldnotbehere" utility knows when schedulers are not 
giving them a good list
of user jobs, or you might kill all your jobs because you turned off your 
scheduler. Make sure all kills done by the
utility are recorded to a central log so you can always know what is being 
killed.  Prevention is still the best medicine,
so if possible use methods that keep the job clean in the first place like 
mpiexec(1).  If you are using commercial
codes that don't allow you to modify the launch mechanism to a robust one 
that cleans up after itself, you can get the
same effect with a little effort with two other common methods:
     1) change the remote process startup command (typically ssh, remsh, or 
rsh) to your own script; now several things
          are possible to help you clean up jobs.
     2) make the command executed on the remote machines a script instead of 
the actual executable.
The details of why these are useful things to do are a bit long for this 
discussion at this point; but I can elaborate if any of this sounds useful
to you. If you have a small number of users on a small number of nodes with 
long running jobs creating such a utility as "shouldnotbehere"
is probably overkill; but having just one node running overloaded can cause 
many types of large parallel jobs to run very poorly ("one bad
apple spoils the bunch"). You can easily end up where the nodes of your jobs 
act like racehorses that stop after each lap and wait for the
slowest one to catch up before starting the next lap. Depending on different 
system and MPI/PVM settings, those waiting nodes can look deceptively busy 
and
"productive".  So at least for me, making a "shouldnotbehere" utility has 
been well worth it. As always, use the concept at your own risk.
Sorry I can't publish "shouldnotbehere" here (It's really not all that 
complicated, and is just a ksh(1) script) but I can't. But depending on
your OS, something as simple as  "ps -e -ouser=|sort|uniq|xargs"  can give 
you a list of usernames on a node; and something like
pbsnodes -a `hostname`  (or LSF's bjobs -u all -m `hostname`) is one of may 
ways to list jobs on a node.
And "runaway" processes often (but not always) have a parent process of 1 if 
things have gone badly.

----- Original Message ----- 
From: "Rushton Martin" <JMRUSHTON at qinetiq.com>
To: "Torque Users Mailing List" <torqueusers at supercluster.org>
Sent: Thursday, August 12, 2010 5:56 AM
Subject: Re: [torqueusers] Nodes that pbs reports are busy which are 
actually running a job


> Be careful about assuming that one user = one job.  When our new cluster
> was delivered someone had configured the epilogue to kill off all
> processes belonging to the user, but with 8 or 16 cores per node we were
> caught when one user had several jobs running.  The first job to finish
> killed off the user's other jobs.
>
> Martin Rushton
> Weapons Technologies
> Tel: 01959 514777, Mobile: 07939 219057
> email: jmrushton at QinetiQ.com
> www.QinetiQ.com
> QinetiQ - Delivering customer-focused solutions
>
> Please consider the environment before printing this email.
> -----Original Message-----
> From: torqueusers-bounces at supercluster.org
> [mailto:torqueusers-bounces at supercluster.org] On Behalf Of Garrick
> Staples
> Sent: 11 August 2010 23:13
> To: Torque Users Mailing List
> Subject: Re: [torqueusers] Nodes that pbs reports are busy which are
> actually running a job
>
> On Wed, Aug 11, 2010 at 04:59:07PM -0500, Rahul Nabar alleged:
>> On Wed, Aug 11, 2010 at 4:53 PM, Garrick Staples <garrick at usc.edu>
> wrote:
>> >
>> > Nope, it doesn't have a job. What you have are stale processes from
> an old job.
>>
>> Thanks! I killed them, Does PBS cleanup processes after a job ends
>> automatically? Or is there a suitable flag? These are non-shared nodes
>
>> so no risk of stepping on another jobs processes. All 8 cores are
>> always assigned to same user.
>>
>> If not is it a OK fix to put a pkill in the epilogue for all normal
>> usernames. Any caveats? Or better ideas?
>
> It will kill processes that it knows about. This includes any children
> of the batch script and any processes launched through the TM interface.
> Any remote processes started through a remote shell are unknown to PBS
> and can't be killed. It is up to your epilogue to figure out what else
> needs to be killed.
>
> --
> Garrick Staples, GNU/Linux HPCC SysAdmin University of Southern
> California
>
> Life is Good!
> This email and any attachments to it may be confidential and are
> intended solely for the use of the individual to whom it is
> addressed. If you are not the intended recipient of this email,
> you must neither take any action based upon its contents, nor
> copy or show it to anyone. Please contact the sender if you
> believe you have received this email in error. QinetiQ may
> monitor email traffic data and also the content of email for
> the purposes of security. QinetiQ Limited (Registered in England
> & Wales: Company Number: 3796233) Registered office: 85
> Buckingham Gate, London SW1E 6PD http://www.qinetiq.com.
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
> 



More information about the torqueusers mailing list