[torqueusers] Problem with one node : " pbs_mom; Job; 46.master; task not started, '/bin/sh', stdio setup failed (see syslog) "
Abraham Zamudio
abraham.zamudio at gmail.com
Tue Sep 28 08:57:29 MDT 2010
Hi everybody ,
I have a problem with one of my nodes :
*[mpiX at quad2 ~]$ cat /var/spool/torque/mom_logs/20100928 | grep
46.master*09/28/2010
09:29:29;0008; pbs_mom;Job;46.master;JOIN JOB as node 109/28/2010
09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)09/28/2010
09:29:29;0001; pbs_mom;Job;46.master;task not started, '/bin/sh', stdio
setup failed (see syslog)09/28/2010 09:29:29;0008;
pbs_mom;Job;46.master;ERROR: received request 'SPAWN_TASK' from
10.10.10.3:1023 for job '46.master' (cannot start task)
The status of job is active
*[mpiX at master mpi_fitting]$ showq*
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING
STARTTIME
46 mpiX Running 12 00:35:52 Tue Sep 28
09:32:56
1 Active Job 12 of 12 Processors Active (100.00%)
2 of 2 Nodes Active (100.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT
QUEUETIME
Total Jobs: 1 Active Jobs: 1 Idle Jobs: 0 Blocked Jobs: 0
The same software (mpich2+gsl) run on a single node of 8 cores, This problem
occurs when two nodes use .
--
Abraham Zamudio Ch.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20100928/a7db50f6/attachment.html
More information about the torqueusers
mailing list