[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 15 13:00:34 MDT 2005
Garrick Staples wrote:
> Can you repeat that with a single-node, single-proc job please?
>
> How is the job requested? Any special limits like mem, vmem, file,
> etc.? Is -d or -D used?
I don't do anything special.
Here is the result, and on 1 node pbsdsh works:
# qsub -I -l nodes=1:d510
qsub: waiting for job 154.ymer.fysik.dtu.dk to start
qsub: job 154.ymer.fysik.dtu.dk ready
[ohnielse at n469 ~]$ pbsdsh -v hostname
pbsdsh: spawned task 0
pbsdsh: waiting on 1 spawned and 0 obits
n469.dcsc.fysik.dtu.dk
spawn event returned: 0
pbsdsh: sending obit for task 2
pbsdsh: waiting on 0 spawned and 1 obits
obit event returned: 0
pbsdsh: task 0 exit status 0
However, on 3 nodes it still fails:
# qsub -I -l nodes=3:d510
qsub: waiting for job 155.ymer.fysik.dtu.dk to start
qsub: job 155.ymer.fysik.dtu.dk ready
[ohnielse at n469 ~]$ pbsdsh -v hostname
pbsdsh: spawned task 0
pbsdsh: spawned task 1
pbsdsh: spawned task 2
pbsdsh: waiting on 3 spawned and 0 obits
spawn event returned: 0
error 17000 on spawn
pbsdsh: waiting on 2 spawned and 0 obits
spawn event returned: 2
error 15010 on spawn
pbsdsh: waiting on 1 spawned and 0 obits
spawn event returned: 1
error 15010 on spawn
How about Troy Baer's suggestion about $clienthost being required
in the MOM config file ?
Thanks,
Ole
More information about the torqueusers
mailing list