[torqueusers] Re: LAM-MPI won't boot with torque-1.2.0p6

garrick garrick at usc.edu
Thu Sep 15 12:15:38 MDT 2005


On Thu, Sep 15, 2005 at 07:01:18PM +0200, Ole Holm Nielsen alleged:
> Garrick Staples wrote:
> >>Question:  Is Torque's LAM-MPI "tm" boot schema supposed to be
> >>>working correctly with torque-1.2.0p6 ?  I'd love to get it to
> >>>work because of the performance improvements promised in the
> >>>LAM-MPI documentation.
> > 
> >It absolutely should be working.  Can you try something really simple
> >like 'pbsdsh hostname' in your job?  Optionally, 'pbsdsh -v hostname'.
> >If it is failing, check the mom logs with an increased loglevel.
> 
> The result is very interesting, showing obvious errors:
> 
> $ pbsdsh -v hostname
> pbsdsh: spawned task 0
> pbsdsh: spawned task 1
> pbsdsh: spawned task 2
> pbsdsh: waiting on 3 spawned and 0 obits
> spawn event returned: 0
> error 17000 on spawn
> pbsdsh: waiting on 2 spawned and 0 obits
> spawn event returned: 1
> error 15010 on spawn
> pbsdsh: waiting on 1 spawned and 0 obits
> spawn event returned: 2
> error 15010 on spawn

Can you repeat that with a single-node, single-proc job please?

How is the job requested?  Any special limits like mem, vmem, file,
etc.?  Is -d or -D used?

-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050915/2f4b179c/attachment.bin


More information about the torqueusers mailing list