[torquedev] keep completed jobs
David B Jackson
jacksond at clusterresources.com
Sat Jan 7 01:23:18 MST 2006
Awesome! Nice job!
> On Sat, Jan 07, 2006 at 12:36:41AM -0700, David B Jackson alleged:
>> Did you fix the 'KEEPCOMPLETED' code? It was in before from another
>> developer but would SEGV on occasion (once per 1000 jobs/once per day?)
>> We never got back to isolating the failure and backed it out.
> I think so. Their were a few problems. The work task created wasn't
> tied to the job, so it wasn't getting cleaned up correctly.
> set_statechar() didn't know about the new state, so *statechar was being
> read of the end of the string. PJobState didn't have an entry for the
> new state. And PJobSubState was missing a comma at number 29, so
> logged messages for all higher substates were off by one.
>> I will roll another snaphsot now and we will start testing in house.
> I'll throw a 1000 jobs at this and see what happens.
>> > Dave, can you roll a snapshot please?
>> > I've just checked in the "keep completed jobs" support. This keeps
>> > around for a bit after they've exited in state "C". Set
>> > TORQUEKEEPCOMPLETED in $PBSHOME/pbs_environment, restart pbs_server,
>> > watch the magic!
>> > 'qdel' for a completed job is not allowed, but an admin can 'qdel -p'
>> > it.
>> > Currently this is hardwired at 300 seconds, but if everything seems to
>> > work,
>> > I'll throw in a server attribute.
>> > --
>> > Garrick Staples, Linux/HPCC Administrator
>> > University of Southern California
>> > _______________________________________________
>> > torquedev mailing list
>> > torquedev at supercluster.org
>> > http://www.supercluster.org/mailman/listinfo/torquedev
> Garrick Staples, Linux/HPCC Administrator
> University of Southern California
> torquedev mailing list
> torquedev at supercluster.org
More information about the torquedev