[torqueusers] Problem with one node : " pbs_mom; Job; 46.master; task not started, '/bin/sh', stdio setup failed (see syslog) "

Abraham Zamudio abraham.zamudio at gmail.com
Tue Sep 28 09:17:40 MDT 2010


The output of qstat :
*
*
*[mpiX at master mpi_fitting]$ qstat *
Job id                    Name             User            Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
46.master                 mpi_fitting      mpiX            00:00:00 R batch



I will ask permission from the administrator to view syslog
(/var/log/messages)


On Tue, Sep 28, 2010 at 10:04 AM, Ken Nielson <
knielson at adaptivecomputing.com> wrote:

>  On 09/28/2010 08:57 AM, Abraham Zamudio wrote:
>
> Hi everybody ,
>
>   I have a problem with one of my nodes :
>
> *[mpiX at quad2 ~]$ cat /var/spool/torque/mom_logs/20100928 | grep 46.master*09/28/2010
> 09:29:29;0008;   pbs_mom;Job;46.master;JOIN JOB as node 109/28/2010
> 09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
> setup failed (see syslog)09/28/2010 09:29:29;0008;
> pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
> 10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
> 09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
> setup failed (see syslog)09/28/2010 09:29:29;0008;
> pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
> 10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
> 09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
> setup failed (see syslog)09/28/2010 09:29:29;0008;
> pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
> 10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
> 09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
> setup failed (see syslog)09/28/2010 09:29:29;0008;
> pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
> 10.10.10.3:1023 for job '46.master' (cannot start task)
>
>  The status of job is active
>
>  *[mpiX at master mpi_fitting]$ showq*
> ACTIVE JOBS--------------------
> JOBNAME            USERNAME      STATE  PROC   REMAINING
>  STARTTIME
>
>  46                     mpiX    Running    12    00:35:52  Tue Sep 28
> 09:32:56
>
>       1 Active Job       12 of   12 Processors Active (100.00%)
>                          2 of    2 Nodes Active      (100.00%)
>
>  IDLE JOBS----------------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT
>  QUEUETIME
>
>
>  0 Idle Jobs
>
>  BLOCKED JOBS----------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT
>  QUEUETIME
>
>
>  Total Jobs: 1   Active Jobs: 1   Idle Jobs: 0   Blocked Jobs: 0
>
>  The same software (mpich2+gsl) run on a single node of 8 cores, This
> problem occurs when two nodes use .
>
>
>
> --
> Abraham Zamudio Ch.
>
>
> _______________________________________________
> torqueusers mailing listtorqueusers at supercluster.orghttp://www.supercluster.org/mailman/listinfo/torqueusers
>
>  What does qstat show? Did you look at syslog?
>
> Ken Nielson
> Adaptive Computing
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>


-- 
Abraham Zamudio Ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100928/5c26b9c9/attachment.html 


More information about the torqueusers mailing list