[torquedev] failing job init is fubar
Garrick Staples
garrick at clusterresources.com
Wed Mar 7 16:11:49 MST 2007
Turns out that any failure to initialize a job when pbs_server is
restarting is entirely mishandled and generally causes it to segfault.
An easy to trigger this is to create a temp execution queue, submit a
job to that queue, stop pbs_server, remove the queue state file, and
start pbs_server again. Trying to reenque the job into a non-existing
queue fails the job init.
pbsd_init_job()
-> pbsd_init_reque()
-> svr_enquejob()
<- returns PBSE_UNKQUE
-> job_abt()
<- returns after completely free()ing the job struct
<- returns void
has no idea anything went wrong and continues to access pjob and blows
up.
So, um, this might be fun for someone else to fix :)
More information about the torquedev
mailing list