[torqueusers] Bug? in torque 2.0.0p2
garrick at usc.edu
Tue Dec 20 16:21:05 MST 2005
On Mon, Dec 19, 2005 at 11:17:12AM +0100, ?ke Sandgren alleged:
> On Sat, 2005-12-17 at 11:29 +0100, ?ke Sandgren wrote:
> > > This means the child process that will eventually become the job has
> > > died. Unfortunately, this is really really hard to debug.
> > >
> > > Have you tried 2.0.0p4? We're always fixing bugs and have improved
> > > logging in this area. Be sure you configure with --enable-syslog.
> > >
> > > One unfixed bug that can cause this is with the job's environmental
> > > variables. Vars with newlines and commas can break things.
> > It's still there in 2.0.0p4 although the error message this time says
> > Bad file descriptor (9) in TMomFinalizeJob3, read of pipe for sid failed
> > for job 200045.ingrid-h.hpc2n.umu.se (0 of 8 bytes)
> > I'll try to find this next week...
> The problem was a newline in one of the env variables.
> I have written a patch for qsub to handle this and a few other things
> regarding the set_job_env function. Take a look at it and use the parts
> you feel is worth using.
I tried your patch, and it stops the segfaulting, but things are still
not quite right.
In a job, I get...
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20051220/fb79dabc/attachment.bin
More information about the torqueusers