[torqueusers] Could someone please comment on this?
Joseph Norris
jnorris at ucmerced.edu
Thu May 22 08:50:18 MDT 2008
THanks for responding Jan. I have done this repeatedly cluster-fork
cleanipcs and cluster-fork ipcrm - there has been no change.
Jan Ploski wrote:
> torqueusers-bounces at supercluster.org schrieb am 05/21/2008 08:42:31 PM:
>
>
>> Hello,
>>
>> I am really having trouble with the qsub error:
>>
>> rm_18238: p4_error: semget failed for setnum: 0
>> p0_3227: (1.105469) net_recv failed for fd = 15
>> p0_3227: p4_error: net_recv read, errno = : 104
>> p0_3227: (139.230469) net_send: could not write to fd=4, errno = 32
>>
>>
>> I do not know how to begin to trace this error. I have looked at my
>> nodes file in the PBS_NODES_FILE var and thought I found the offending
>> node - re-installed rocks but no change.
>>
>
> I don't think it is a qsub issue. It looks more like a problem with shared
> memory management in MPICH. Google results suggest that you should run
> 'ipcs' as root on the node to find out if there are any non-released
> shared memory segments. If needed, remove them manually with 'cleanipcs'
> or 'ipcrm'.
>
> Regards,
> Jan Ploski
>
>
--
Joseph Norris
Programmer III/System Administrator
UC Merced School of Natural Sciences
Phone: 209-228-4576
Cell: 209-201-3410
More information about the torqueusers
mailing list