[torqueusers] Could someone please comment on this?

Joseph Norris jnorris at ucmerced.edu
Thu May 22 08:50:18 MDT 2008


THanks for responding Jan.  I have done this repeatedly cluster-fork 
cleanipcs and cluster-fork ipcrm  - there has been no change.

Jan Ploski wrote:
> torqueusers-bounces at supercluster.org schrieb am 05/21/2008 08:42:31 PM:
>
>   
>> Hello,
>>
>> I am really having trouble with the qsub error:
>>
>> rm_18238:  p4_error: semget failed for setnum: 0
>> p0_3227: (1.105469) net_recv failed for fd = 15
>> p0_3227:  p4_error: net_recv read, errno = : 104
>> p0_3227: (139.230469) net_send: could not write to fd=4, errno = 32
>>
>>
>> I do not know how to begin to trace this error.  I have looked at my 
>> nodes file in the PBS_NODES_FILE var and thought I found the offending 
>> node - re-installed rocks but no change.
>>     
>
> I don't think it is a qsub issue. It looks more like a problem with shared 
> memory management in MPICH. Google results suggest that you should run 
> 'ipcs' as root on the node to find out if there are any non-released 
> shared memory segments. If needed, remove them manually with 'cleanipcs' 
> or 'ipcrm'.
>
> Regards,
> Jan Ploski
>
>   

-- 


Joseph Norris
Programmer III/System Administrator
UC Merced School of Natural Sciences

Phone: 209-228-4576
Cell:  209-201-3410




More information about the torqueusers mailing list