[torqueusers] Problem with one node : " pbs_mom; Job; 46.master; task not started, '/bin/sh', stdio setup failed (see syslog) "

Ken Nielson knielson at adaptivecomputing.com
Tue Sep 28 09:04:41 MDT 2010


On 09/28/2010 08:57 AM, Abraham Zamudio wrote:
> Hi everybody ,
>
> I have a problem with one of my nodes :
>
> *[mpiX at quad2 ~]$ cat /var/spool/torque/mom_logs/20100928 | grep 
> 46.master*09/28/2010 09:29:29;0008;   pbs_mom;Job;46.master;JOIN JOB 
> as node 109/28/2010 09:29:29;0001;   pbs_mom;Job;46.master;task not 
> started, '/bin/sh', stdio setup failed (see syslog)09/28/2010 
> 09:29:29;0008;   pbs_mom;Job;46.master;ERROR:    received request 
> 'SPAWN_TASK' from 10.10.10.3:1023 <http://10.10.10.3:1023> for job 
> '46.master' (cannot start task)09/28/2010 09:29:29;0001;   
> pbs_mom;Job;46.master;task not started, '/bin/sh', stdio setup failed 
> (see syslog)09/28/2010 09:29:29;0008;   pbs_mom;Job;46.master;ERROR:   
>  received request 'SPAWN_TASK' from 10.10.10.3:1023 
> <http://10.10.10.3:1023> for job '46.master' (cannot start 
> task)09/28/2010 09:29:29;0001;   pbs_mom;Job;46.master;task not 
> started, '/bin/sh', stdio setup failed (see syslog)09/28/2010 
> 09:29:29;0008;   pbs_mom;Job;46.master;ERROR:    received request 
> 'SPAWN_TASK' from 10.10.10.3:1023 <http://10.10.10.3:1023> for job 
> '46.master' (cannot start task)09/28/2010 09:29:29;0001;   
> pbs_mom;Job;46.master;task not started, '/bin/sh', stdio setup failed 
> (see syslog)09/28/2010 09:29:29;0008;   pbs_mom;Job;46.master;ERROR:   
>  received request 'SPAWN_TASK' from 10.10.10.3:1023 
> <http://10.10.10.3:1023> for job '46.master' (cannot start task)
>
> The status of job is active
>
> *[mpiX at master mpi_fitting]$ showq*
> ACTIVE JOBS--------------------
> JOBNAME            USERNAME      STATE  PROC   REMAINING           
>  STARTTIME
>
> 46                     mpiX    Running    12    00:35:52  Tue Sep 28 
> 09:32:56
>
>      1 Active Job       12 of   12 Processors Active (100.00%)
>                          2 of    2 Nodes Active      (100.00%)
>
> IDLE JOBS----------------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT           
>  QUEUETIME
>
>
> 0 Idle Jobs
>
> BLOCKED JOBS----------------
> JOBNAME            USERNAME      STATE  PROC     WCLIMIT           
>  QUEUETIME
>
>
> Total Jobs: 1   Active Jobs: 1   Idle Jobs: 0   Blocked Jobs: 0
>
> The same software (mpich2+gsl) run on a single node of 8 cores, This 
> problem occurs when two nodes use .
>
>
>
> -- 
> Abraham Zamudio Ch.
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>    
What does qstat show? Did you look at syslog?

Ken Nielson
Adaptive Computing
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100928/7201cc21/attachment-0001.html 


More information about the torqueusers mailing list