[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Thu Sep 15 13:00:34 MDT 2005


Garrick Staples wrote:
> Can you repeat that with a single-node, single-proc job please?
> 
> How is the job requested?  Any special limits like mem, vmem, file,
> etc.?  Is -d or -D used?

I don't do anything special.

Here is the result, and on 1 node pbsdsh works:

# qsub -I -l nodes=1:d510
qsub: waiting for job 154.ymer.fysik.dtu.dk to start
qsub: job 154.ymer.fysik.dtu.dk ready

[ohnielse at n469 ~]$ pbsdsh -v hostname
pbsdsh: spawned task 0
pbsdsh: waiting on 1 spawned and 0 obits
n469.dcsc.fysik.dtu.dk
spawn event returned: 0
pbsdsh: sending obit for task 2
pbsdsh: waiting on 0 spawned and 1 obits
obit event returned: 0
pbsdsh: task 0 exit status 0

However, on 3 nodes it still fails:

# qsub -I -l nodes=3:d510
qsub: waiting for job 155.ymer.fysik.dtu.dk to start
qsub: job 155.ymer.fysik.dtu.dk ready

[ohnielse at n469 ~]$ pbsdsh -v hostname
pbsdsh: spawned task 0
pbsdsh: spawned task 1
pbsdsh: spawned task 2
pbsdsh: waiting on 3 spawned and 0 obits
spawn event returned: 0
error 17000 on spawn
pbsdsh: waiting on 2 spawned and 0 obits
spawn event returned: 2
error 15010 on spawn
pbsdsh: waiting on 1 spawned and 0 obits
spawn event returned: 1
error 15010 on spawn

How about Troy Baer's suggestion about $clienthost being required
in the MOM config file ?

Thanks,
Ole



More information about the torqueusers mailing list