[torqueusers] pbs_mom prolog timeout

Stijn De Weirdt stijn.deweirdt at ugent.be
Sun Dec 20 15:22:00 MST 2009


hi all,

more playing with per job prolog scripts and another problem found
(still torque 2.4.2).

pbs_mom has a configuration parameter prologalarm and the pbs_mom manual
has the following to say about it: "Specifies maximum duration (in
seconds) which the mom will wait for the job prolog or job job epilog to
complete."

but i changed the parameter and still i get "child not started after 300
seconds, server will retry" errors, after which the prolog script is
started again (and not all child processes of the first attempt are
killed properly btw, very very unclean behaviour!)

the error seems to come from a variable in mom_main.c called
TJobStartTimeout, and in the code i can't see any relation with
prologalarm setting.

could anyone explain the difference?

stijn

-- 
http://hasthelhcdestroyedtheearth.com/




More information about the torqueusers mailing list