[torqueusers] p4_error: latest msg from perror: Bad file
descriptor
Garrick Staples
garrick at clusterresources.com
Tue Oct 10 11:56:34 MDT 2006
On Tue, Oct 10, 2006 at 12:35:48PM +0100, Vadivelan Ranjith alleged:
>
> Hi
> I thank you for helping to all.
> Today i got a error message by sumbitting job. First i
> ran the code using explict method. I got result accurately, and no
> problem occured when i sumbit job. Now i changed my code to implict method.
> I got error when i sumbit job.
> I checked correctly, it reading all files and
> iteration starts. after one iteration it gives the following error. The same
> code is running on other machine, giving result correctly. So please help
> me how to fix
> it.
Looks to me like one of the processes is segfaulting (though I don't
actually see a segfault reported, the reported errors are SIGPIPE, Bad
File Descriptor, and Broken Pipe.)
I'd run your program under a parallel debugger and ask again on an MPI
or MPICH list.
> Advance thanks
> Velan
>
> ----------------------------------------------------------------
> job.e file:
> p4_error: latest msg from perror: Bad file descriptor
> p4_error: latest msg from perror: Bad file descriptor
> p4_error: latest msg from perror: Bad file descriptor
> p4_error: latest msg from perror: Bad file descriptor
> -----------------------------------------------------------------
> job.o file:
> 3
> node18.local
> node19.local
> node17.local
> # Allocating 5 nodes to block 1
> # Allocating 1 nodes to block 2
> # Require mxb >= 97
> # Require mxa >= 26 mya >= 97 and mza >= 75
> # Maximum load imbalance = 71.69%
> # Navier-Stokes Simulation
> # Implicit Full Matrix DP-LUR
> # Reading restart files...( 0.34 seconds)
> # Freestream Mach Number = 6.50
>
> 1 0.3670E+01 0.7803E+05 16 15 7 2 0.1222E-08
> p5_2609: p4_error: interrupt SIGx: 13
> bm_list_17559: (3.666982) wakeup_slave: unable to interrupt slave 0 pid 17542
> rm_l_1_18696: (2.738297) net_send: could not write to fd=6, errno = 9
> rm_l_1_18696: p4_error: net_send write: -1
> rm_l_2_2605: (2.614927) net_send: could not write to fd=6, errno = 9
> rm_l_4_18718: (2.373120) net_send: could not write to fd=6, errno = 9
> rm_l_4_18718: p4_error: net_send write: -1
> rm_l_2_2605: p4_error: net_send write: -1
> rm_l_3_17584: (2.496277) net_send: could not write to fd=6, errno = 9
> rm_l_3_17584: p4_error: net_send write: -1
> rm_l_5_2626: (2.249144) net_send: could not write to fd=5, errno = 32
> p5_2609: (2.251356) net_send: could not write to fd=5,errno = 32
> -------------------------------------------------------------------
> job file:
> #!/bin/bash
> #PBS -l nodes=3:ppn=1
>
> cd $PBS_O_WORKDIR
> n=`/usr/local/bin/pbs.py $PBS_NODEFILE hosts`
> echo $n
> cat hosts
> /opt/mpich/intel/bin/mpirun -nolocal -machinefile
> hosts -np 6 pg3d.exe
> -------------------------------------------------------------------
> Machine configuration:
> CPU: Intel(R) Dual Processor Xeon(R) CPU 3.2GHz
> Installation using rocks4.1
>
>
>
>
> ---------------------------------
> Find out what India is talking about on - Yahoo! Answers India
> Send FREE SMS to your friend's mobile from Yahoo! Messenger Version 8. Get it NOW
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
More information about the torqueusers
mailing list