[torqueusers] Could someone please comment on this?
Jan Ploski
Jan.Ploski at offis.de
Thu May 22 08:46:54 MDT 2008
torqueusers-bounces at supercluster.org schrieb am 05/21/2008 08:42:31 PM:
> Hello,
>
> I am really having trouble with the qsub error:
>
> rm_18238: p4_error: semget failed for setnum: 0
> p0_3227: (1.105469) net_recv failed for fd = 15
> p0_3227: p4_error: net_recv read, errno = : 104
> p0_3227: (139.230469) net_send: could not write to fd=4, errno = 32
>
>
> I do not know how to begin to trace this error. I have looked at my
> nodes file in the PBS_NODES_FILE var and thought I found the offending
> node - re-installed rocks but no change.
I don't think it is a qsub issue. It looks more like a problem with shared
memory management in MPICH. Google results suggest that you should run
'ipcs' as root on the node to find out if there are any non-released
shared memory segments. If needed, remove them manually with 'cleanipcs'
or 'ipcrm'.
Regards,
Jan Ploski
More information about the torqueusers
mailing list