[torquedev] keep completed jobs
garrick at usc.edu
Sat Jan 7 00:54:10 MST 2006
On Sat, Jan 07, 2006 at 12:36:41AM -0700, David B Jackson alleged:
> Did you fix the 'KEEPCOMPLETED' code? It was in before from another
> developer but would SEGV on occasion (once per 1000 jobs/once per day?)
> We never got back to isolating the failure and backed it out.
I think so. Their were a few problems. The work task created wasn't
tied to the job, so it wasn't getting cleaned up correctly.
set_statechar() didn't know about the new state, so *statechar was being
read of the end of the string. PJobState didn't have an entry for the
new state. And PJobSubState was missing a comma at number 29, so
logged messages for all higher substates were off by one.
> I will roll another snaphsot now and we will start testing in house.
I'll throw a 1000 jobs at this and see what happens.
> > Dave, can you roll a snapshot please?
> > I've just checked in the "keep completed jobs" support. This keeps jobs
> > around for a bit after they've exited in state "C". Set
> > TORQUEKEEPCOMPLETED in $PBSHOME/pbs_environment, restart pbs_server, and
> > watch the magic!
> > 'qdel' for a completed job is not allowed, but an admin can 'qdel -p'
> > it.
> > Currently this is hardwired at 300 seconds, but if everything seems to
> > work,
> > I'll throw in a server attribute.
> > --
> > Garrick Staples, Linux/HPCC Administrator
> > University of Southern California
> > _______________________________________________
> > torquedev mailing list
> > torquedev at supercluster.org
> > http://www.supercluster.org/mailman/listinfo/torquedev
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20060106/f29b9956/attachment.bin
More information about the torquedev