[torquedev] keep completed jobs

Garrick Staples garrick at usc.edu
Sat Jan 7 00:54:10 MST 2006


On Sat, Jan 07, 2006 at 12:36:41AM -0700, David B Jackson alleged:
> Garrick,
> 
>   Did you fix the 'KEEPCOMPLETED' code?  It was in before from another
> developer but would SEGV on occasion (once per 1000 jobs/once per day?) 
> We never got back to isolating the failure and backed it out.

I think so.  Their were a few problems.  The work task created wasn't
tied to the job, so it wasn't getting cleaned up correctly.
set_statechar() didn't know about the new state, so *statechar was being
read of the end of the string.  PJobState[] didn't have an entry for the
new state.  And PJobSubState[] was missing a comma at number 29, so
logged messages for all higher substates were off by one.

 
>   I will roll another snaphsot now and we will start testing in house.

I'll throw a 1000 jobs at this and see what happens.

> 
> Thanks!
> Dave
> 
> > Dave, can you roll a snapshot please?
> >
> > I've just checked in the "keep completed jobs" support.  This keeps jobs
> > around for a bit after they've exited in state "C".  Set
> > TORQUEKEEPCOMPLETED in $PBSHOME/pbs_environment, restart pbs_server, and
> > watch the magic!
> >
> > 'qdel' for a completed job is not allowed, but an admin can 'qdel -p'
> > it.
> >
> > Currently this is hardwired at 300 seconds, but if everything seems to
> > work,
> > I'll throw in a server attribute.
> >
> >
> > --
> > Garrick Staples, Linux/HPCC Administrator
> > University of Southern California
> > _______________________________________________
> > torquedev mailing list
> > torquedev at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torquedev
> >
> 

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060106/f29b9956/attachment.bin


More information about the torquedev mailing list