[torqueusers] Problem with one node : " pbs_mom; Job; 46.master; task not started, '/bin/sh', stdio setup failed (see syslog) "
Abraham Zamudio
abraham.zamudio at gmail.com
Tue Sep 28 09:24:20 MDT 2010
*Syslog*
*
*
*[root at quad2 mpiX]# cat /var/log/messages | grep pbs*
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=51586:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=51586:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=51586:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=51586:51586
Sep 28 08:48:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=34675:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=34675:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=34675:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=34675:34675
Sep 28 09:12:07 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=48625:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=48625:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=48625:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::No route to host (113) in
open_demux, open_demux: connect 10.10.10.3:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in search_env_and_open, failed connect to stdio on
MPIEXEC_STDOUT_PORT=48625:48625
Sep 28 09:29:29 quad2 pbs_mom: LOG_ERROR::Inappropriate ioctl for device
(25) in start_process, cannot locate MPIEXEC_STDOUT_PORT
My /etc/hosts :
*root at quad2 mpiX]# cat /etc/hosts*
#127.0.0.1 jro-operations localhost localhost.localdomain localhost4
localhost4.localdomain4
#::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
127.0.0.1 localhost
10.10.10.241 master
############# NODOS MPICH V2 ########################
10.10.10.3 quad4
10.10.10.4 quad2 jro-operations localhost localhost.localdomain
10.10.10.236 gauss
############# NODOS MPICH V2 ########################
On Tue, Sep 28, 2010 at 10:17 AM, Abraham Zamudio <abraham.zamudio at gmail.com
> wrote:
> The output of qstat :
> *
> *
> *[mpiX at master mpi_fitting]$ qstat *
> Job id Name User Time Use S Queue
> ------------------------- ---------------- --------------- -------- - -----
> 46.master mpi_fitting mpiX 00:00:00 R batch
>
>
>
> I will ask permission from the administrator to view syslog
> (/var/log/messages)
>
>
> On Tue, Sep 28, 2010 at 10:04 AM, Ken Nielson <
> knielson at adaptivecomputing.com> wrote:
>
>> On 09/28/2010 08:57 AM, Abraham Zamudio wrote:
>>
>> Hi everybody ,
>>
>> I have a problem with one of my nodes :
>>
>> *[mpiX at quad2 ~]$ cat /var/spool/torque/mom_logs/20100928 | grep 46.master
>> *09/28/2010 09:29:29;0008; pbs_mom;Job;46.master;JOIN JOB as node 109/28/2010
>> 09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
>> setup failed (see syslog)09/28/2010 09:29:29;0008;
>> pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
>> 10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
>> 09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
>> setup failed (see syslog)09/28/2010 09:29:29;0008;
>> pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
>> 10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
>> 09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
>> setup failed (see syslog)09/28/2010 09:29:29;0008;
>> pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
>> 10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
>> 09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
>> setup failed (see syslog)09/28/2010 09:29:29;0008;
>> pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
>> 10.10.10.3:1023 for job '46.master' (cannot start task)
>>
>> The status of job is active
>>
>> *[mpiX at master mpi_fitting]$ showq*
>> ACTIVE JOBS--------------------
>> JOBNAME USERNAME STATE PROC REMAINING
>> STARTTIME
>>
>> 46 mpiX Running 12 00:35:52 Tue Sep 28
>> 09:32:56
>>
>> 1 Active Job 12 of 12 Processors Active (100.00%)
>> 2 of 2 Nodes Active (100.00%)
>>
>> IDLE JOBS----------------------
>> JOBNAME USERNAME STATE PROC WCLIMIT
>> QUEUETIME
>>
>>
>> 0 Idle Jobs
>>
>> BLOCKED JOBS----------------
>> JOBNAME USERNAME STATE PROC WCLIMIT
>> QUEUETIME
>>
>>
>> Total Jobs: 1 Active Jobs: 1 Idle Jobs: 0 Blocked Jobs: 0
>>
>> The same software (mpich2+gsl) run on a single node of 8 cores, This
>> problem occurs when two nodes use .
>>
>>
>>
>> --
>> Abraham Zamudio Ch.
>>
>>
>> _______________________________________________
>> torqueusers mailing listtorqueusers at supercluster.orghttp://www.supercluster.org/mailman/listinfo/torqueusers
>>
>> What does qstat show? Did you look at syslog?
>>
>> Ken Nielson
>> Adaptive Computing
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
>>
>
>
> --
> Abraham Zamudio Ch.
>
>
--
Abraham Zamudio Ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100928/49329923/attachment-0001.html
More information about the torqueusers
mailing list