[torquedev] jobs owners purging own jobs?

Andrew J Caird acaird at umich.edu
Fri Feb 24 06:54:06 MST 2006


On Fri, 24 Feb 2006, Garrick Staples wrote:

> On Thu, Feb 23, 2006 at 11:28:16PM -0500, Caird, Andrew J alleged:
>> Thanks Garrick,
>>
>> What makes it dangerous, and can it be made safer?  I'm willing to give 
>> a try if it's possible.
>
> qdel -p simply tells pbs_server to purge everything it knows about the 
> job, without talking to MS.  The dangerous part is that it doesn't talk 
> to MS.  The job may possibly still be running, epilogue scripts may 
> still be run, and possibly random processes are going to get killed. 
> It is basicly intentionally breaking the entire job state machine and is 
> only a last resort to removing a job.
>
> This functionality should not be needed by users.  If you frequently 
> find that jobs aren't killable, then maybe that is something we need to 
> fix.
>
>> My goal is to keep funtionality we had with PBSPro ('-W force').
>
> I don't know PBSPro, but I doubt a user-accessible -W force is 
> equivalent to -p.

   I, of course, haven't seen the PBSPro source, but from the man page:
    "The -W force option, where force is the literal character string
     force, directs  that the  job  is  to be deleted even if the node
     on which the job is executing is unreachable"

   This is what we need.

   When a node crashes, we often see PBS continue to consider the job 
active.  Because of the per-user and per-group job limits we have, our 
users feel cheated.

   If there is a way for PBS to remove a job when the node is marked 
'down', that would also be fine.  My experience with Torque is that when a 
node is down, 'qdel <jobid>' doesn't work.

   Thanks.
--andy


More information about the torquedev mailing list