[torqueusers] Compute nodes can not work
Greenseid, Joseph M (IS)
Joseph.Greenseid at ngc.com
Wed Apr 29 07:50:37 MDT 2009
it sounds like the compute nodes are having trouble sending files back to the head node once the job is complete. from a compute node, can you ssh back to the head node with no password?
do the .out and .err files end up in the "undelivered" directory of the PBS directory on the compute nodes?
--Joe
________________________________
From: torqueusers-bounces at supercluster.org on behalf of baibart
Sent: Wed 4/29/2009 8:07 AM
To: torqueusers
Subject: [torqueusers] Compute nodes can not work
Hi all
I have 3 nodes .The server node also joins computing .When i sent a job with the other two computing nodes the job can not run .
For example qsub 1.job .
cat 1.job
#!/bin/sh
#PBS -l nodes=3:ppn=2
#PBS -N pbs
cat $PBS_NODEFILE>/home/pbs/3
The state is E
if I choose one node, i must use the head node (cause if i turn off the head node .also can not work).it run.
#!/bin/sh
#PBS -l nodes=1:ppn=2
#PBS -N pbs
cat $PBS_NODEFILE>/home/pbs/3
cat 3
node1
node1
node1 is my server node
pbsnode -a
all nodes are free They seem ok
cat /var/spool/torque/mom_priv/config
$clienthost node1
$logevent 255
each node has the same config
Thanks in advance!
2009-04-29
________________________________
baibart
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20090429/3966f921/attachment.html
More information about the torqueusers
mailing list