[torqueusers] read of pipe for sid job error, more info

Glen Beane beaneg at umcs.maine.edu
Tue Sep 21 16:38:52 MDT 2004


Here is some more info, that will hopefully be helpful in tracking down 
the problem.

This problem seems to stem from some kind of bazaar race condition I'm 
seeing.  I don't know who it affects, (all OS X, or maybe a particular 
configuration?).

What happens is, if the node's first job after boot it is mother 
superior, then there are no problems.  If the node is a 'slave' node for 
its first job, then that node will not work (creating the read of pipe 
errors in its log file).

If I boot the cluster, and then 'bless' each node by running a simple 1 
node job on them, then this problem does not seem to appear.

>> On Mon, 2004-09-20 at 16:07, Glen Beane wrote:
>>
>>> I got Invalid Argument (22) in start_process, read of pipe...got 0 not 8
>>>
>>> and
>>>
>>> Unknown error 0: (0) in start_process, read of pipe... got 0 not 8
>>>
>>> On Mon, 2004-09-20 at 15:46, jacksond at supercluster.org wrote:
>>>
>>>> Glen,
>>>>
>>>>    The logged error message should include the 'errno' value associated
>>>> with the read of the pipe.  This would definately be helpful to get us
>>>> started.
>>>>
>>>> Thanks,
>>>> Dave
>>>>
>>>> On Mon, 20 Sep 2004, Glen Beane wrote:
>>>>
>>>>> On my OS X cluster, I keep getting errors from pbs_mom in the form of
>>>>> "read of pipe for sid job xxx got 0 not 8".
>>>>>
>>>>> If I kill pbs_mom on the node with signal 15, then reboot the node,
>>>>> often the problem will seem to go away. Just restarting pbs_mom never
>>>>> fixes the problem.
>>>>>
>>>>>
>>>>> This error is coming from the start_process fuction, this particular
>>>>> block of code starts around line number 2199
>>>>>
>>>>> if (i != sizeof(sjr))
>>>>> {
>>>>>  sprintf(log_buffer, "read of pipe for sid job %s got %d not %d",
>>>>>    pjob->ji_qs.ji_jobid,
>>>>>    i,
>>>>>    sizeof(sjr));
>>>>>
>>>>>  log_err(j,id,log_buffer);
>>>>>
>>>>>  return(-1);
>>>>> }
>>>>>
>>>>>
>>>>> Any help troubleshooting this problem would be greatly appreciated.
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> torqueusers mailing list
>>>>> torqueusers at supercluster.org
>>>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>>>
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://supercluster.org/mailman/listinfo/torqueusers
>>>
>>>
>>> _______________________________________________
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
>>> http://supercluster.org/mailman/listinfo/torqueusers
>>
>>
> 


More information about the torqueusers mailing list