[torqueusers] Compute nodes can not work

Greenseid, Joseph M (IS) Joseph.Greenseid at ngc.com
Wed Apr 29 07:50:37 MDT 2009


it sounds like the compute nodes are having trouble sending files back to the head node once the job is complete.  from a compute node, can you ssh back to the head node with no password?  
 
do the .out and .err files end up in the "undelivered" directory of the PBS directory on the compute nodes? 
 
--Joe

________________________________

From: torqueusers-bounces at supercluster.org on behalf of baibart
Sent: Wed 4/29/2009 8:07 AM
To: torqueusers
Subject: [torqueusers] Compute nodes can not work


Hi all
      I have 3 nodes .The server node also joins computing .When i sent a job with the other two computing nodes the job can not run .
      For example  qsub 1.job .
      cat 1.job 
      #!/bin/sh
      #PBS -l nodes=3:ppn=2
      #PBS -N pbs
       cat $PBS_NODEFILE>/home/pbs/3
 
       The  state is E 
       if I choose one node, i must use the head node (cause if i turn off the head node .also can not work).it run. 
       #!/bin/sh
       #PBS -l nodes=1:ppn=2
       #PBS -N pbs
       cat $PBS_NODEFILE>/home/pbs/3
       cat 3
       node1
       node1
       
       node1 is my server node
       pbsnode -a 
       all nodes are free They seem ok
       cat /var/spool/torque/mom_priv/config
       $clienthost node1
       $logevent 255
       each node has the same config
 
       Thanks in advance!
    
      
 
2009-04-29 
________________________________

baibart 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090429/3966f921/attachment.html 


More information about the torqueusers mailing list