[torqueusers] unable to run parallel jobs with Torque 2.1.9 and
openMPI 1.2.4
Wilko Keegstra
w.keegstra at rug.nl
Wed Oct 17 04:24:07 MDT 2007
I had our cluster running with Torque 2.1.8 and openMPI 1.2.4 (linux
openSUSE 10.2).
After switching to openSUSE 10.3 and installing Torque 2.1.9 and
openMPI 1.2.4 I can submit single cpu-jobs but NO parallel jobs.
My input script:
#PBS -q cluster2
#PBS -l nodes=5:ppn=2
#PBS -l walltime=01:55:20
#PBS -j oe
#PBS -o hemo-mix-psml-pre-msa.log
/usr/local/bin/mpiexec -v -machinefile $PBS_NODEFILE --mca
mpi_paffinity_alone 1 /pcs/programs/grip/bin/msaMpiRun
/pcs/pc14/keegstra/work/hemo/hemo-mix-psml-pre-3dt.img
/pcs/pc14/keegstra/work/hemo/hemo-mix-psml-pre-msa 64 64 2497 var
in the job log file the following message appears:
[rugem41:11405] pls:tm: failed to poll for a spawned proc, return status
= 17002
[rugem41:11405] [0,0,0] ORTE_ERROR_LOG: In errno in file rmgr_urm.c at
line 462
[rugem41:11405] mpiexec: spawn failed with errno=-11
In the pbs-mom log file on rugem41 the following message appears:
10/17/2007 12:13:53;0001; pbs_mom;Job;TMomFinalizeJob3;job
27.rugem14.chem.rug
.nl started, pid = 11353
10/17/2007 12:13:54;0001; pbs_mom;Svr;pbs_mom;Bad file descriptor (9)
in tm_request, bad header Negative sign on an unsigned datum
10/17/2007 12:13:54;0008;
pbs_mom;Job;27.rugem14.chem.rug.nl;kill_task: killin
g pid 11354 task 1 with sig 9
10/17/2007 12:13:54;0080;
pbs_mom;Job;27.rugem14.chem.rug.nl;scan_for_terminat
ed: job 27.rugem14.chem.rug.nl task 1 terminated, sid 11353
10/17/2007 12:13:54;0008; pbs_mom;Job;27.rugem14.chem.rug.nl;job was
terminated
Please could anyone help me with this?
Wilko Keegstra
--
+------------------------------------------------------------+
| Dr. Wilko Keegstra priv.phone: +31594514153,+31610477915|
| Groningen University email: W.Keegstra at rug.nl |
| Dept.of Biophys.Chemistry |
| Nijenborgh 4 phone: +31503634224 |
| 9747 AG GRONINGEN fax : +31503634800 |
| The Netherlands |
+------------------------------------------------------------+
More information about the torqueusers
mailing list