[torqueusers] can't execute multi-processors run

Garrick Staples garrick at usc.edu
Mon Aug 29 09:53:39 MDT 2005


On Sat, Aug 27, 2005 at 09:00:30PM +0000, lorenzo118 at interfree.it alleged:
> 
> Hi,
> I installed Torque 1.2.p04 on my linux cluster of 16 Pentium 4
> processors (with Fedora Core 3), on master and on some node, it
> compiled with no problems, I started all demons and it marks as "free"
> all nodes. Problem is that when I launch a one processor job, I obtain
> only the error file (.e) that contains:
> 
> 
> -bash: line 1: /usr/spool/PBS/mom_priv/jobs/34.medusa.d.SC: No such file or directory:

This is usually a broken #! line at the top of the script.  Either the
path is wrong or it has a DOS EOL character.  Another common cause is a
full /tmp or /usr that is preventing the script file from being copied
correctly.


> Moreover, if I try to launch a two processors job (with qsub -l
> nodes=2 ./hello), it exits after a few seconds without writing
> anything (neither empty .o and .e files). While it's in queue, if I
> try to type qstat -f, I obtain this result:

Are the output files stuck in the node's undelivered directory?  Are
there any errors in mom's log?


-- 
Garrick Staples, Linux/HPCC Administrator
University of Southern California
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20050829/8f96db2c/attachment.bin


More information about the torqueusers mailing list