[torqueusers] p4_error: latest msg from perror: Bad file descriptor

Garrick Staples garrick at clusterresources.com
Tue Oct 10 11:56:34 MDT 2006


On Tue, Oct 10, 2006 at 12:35:48PM +0100, Vadivelan Ranjith alleged:
> 
> Hi
> I thank you for helping to all.
> Today i got a error message by sumbitting job. First i
> ran the code using explict method. I got result accurately, and no
> problem occured when i sumbit job. Now i changed my code to implict method.
> I got error when i sumbit job.
> I checked correctly, it reading all files and
> iteration starts. after one iteration it gives the following error. The same
> code is running on other machine, giving result correctly. So please help
> me how to fix 
> it.

Looks to me like one of the processes is segfaulting (though I don't
actually see a segfault reported, the reported errors are SIGPIPE, Bad
File Descriptor, and Broken Pipe.)

I'd run your program under a parallel debugger and ask again on an MPI
or MPICH list.

 

> Advance thanks
> Velan
> 
> ----------------------------------------------------------------
> job.e file:
>     p4_error: latest msg from perror: Bad file descriptor
>     p4_error: latest msg from perror: Bad file descriptor
>     p4_error: latest msg from perror: Bad file descriptor
>     p4_error: latest msg from perror: Bad file descriptor
> -----------------------------------------------------------------
> job.o file:
> 3
> node18.local
> node19.local
> node17.local
> # Allocating   5 nodes to block  1
> # Allocating   1 nodes to block  2
> # Require mxb >=   97
> # Require mxa >=   26 mya >=   97 and mza >=   75
> # Maximum load imbalance =  71.69%
> # Navier-Stokes Simulation
> # Implicit Full Matrix DP-LUR
> # Reading restart files...( 0.34 seconds)
> # Freestream Mach Number =  6.50
> 
>  1   0.3670E+01   0.7803E+05   16   15    7    2  0.1222E-08
> p5_2609:  p4_error: interrupt SIGx: 13 
> bm_list_17559: (3.666982) wakeup_slave: unable to interrupt slave 0 pid 17542
> rm_l_1_18696: (2.738297) net_send: could not write to fd=6, errno = 9
> rm_l_1_18696:  p4_error: net_send write: -1
> rm_l_2_2605: (2.614927) net_send: could not write to fd=6, errno = 9
> rm_l_4_18718: (2.373120) net_send: could not write to fd=6, errno = 9
> rm_l_4_18718:  p4_error: net_send write: -1
> rm_l_2_2605:  p4_error: net_send write: -1
> rm_l_3_17584: (2.496277) net_send: could not write to fd=6, errno = 9
> rm_l_3_17584:  p4_error: net_send write: -1
> rm_l_5_2626: (2.249144) net_send: could not write to fd=5, errno = 32
> p5_2609: (2.251356) net_send: could not write to fd=5,errno = 32
> -------------------------------------------------------------------
> job file:
> #!/bin/bash
> #PBS -l nodes=3:ppn=1
> 
> cd $PBS_O_WORKDIR
> n=`/usr/local/bin/pbs.py $PBS_NODEFILE hosts`
> echo $n
> cat hosts
> /opt/mpich/intel/bin/mpirun -nolocal -machinefile
> hosts -np 6 pg3d.exe
> -------------------------------------------------------------------
> Machine configuration:
>  CPU: Intel(R) Dual Processor Xeon(R) CPU 3.2GHz
> Installation using rocks4.1
> 
> 
> 
>  				
> ---------------------------------
>  Find out what India is talking about on  - Yahoo! Answers India 
>  Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers



More information about the torqueusers mailing list