[torqueusers] Problem with one node : " pbs_mom; Job; 46.master; task not started, '/bin/sh', stdio setup failed (see syslog) "

Abraham Zamudio abraham.zamudio at gmail.com
Tue Sep 28 08:57:29 MDT 2010


Hi everybody ,

I have a problem with one of my nodes :

*[mpiX at quad2 ~]$ cat /var/spool/torque/mom_logs/20100928 | grep
46.master*09/28/2010
09:29:29;0008;   pbs_mom;Job;46.master;JOIN JOB as node 109/28/2010
09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
09:29:29;0001;   pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR:    received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)

The status of job is active

*[mpiX at master mpi_fitting]$ showq*
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING
 STARTTIME

46                     mpiX    Running    12    00:35:52  Tue Sep 28
09:32:56

     1 Active Job       12 of   12 Processors Active (100.00%)
                         2 of    2 Nodes Active      (100.00%)

IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
 QUEUETIME


0 Idle Jobs

BLOCKED JOBS----------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT
 QUEUETIME


Total Jobs: 1   Active Jobs: 1   Idle Jobs: 0   Blocked Jobs: 0

The same software (mpich2+gsl) run on a single node of 8 cores, This problem
occurs when two nodes use .



-- 
Abraham Zamudio Ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100928/a7db50f6/attachment.html 


More information about the torqueusers mailing list