[torqueusers] Post job file processing error
tracy_luofengji at 126.com
Thu Mar 12 19:55:55 MDT 2009
Hello, I did a fresh installation of torque 2.3.0 on my cluster, and I met a strange post job file processing problem. I did the same installation procedure on all the 5 compute nodes (node1, node2, node3, node4, node5) and node0 acts as the master. On the compute nodes, I just installed the packages:
and then, on the compute nodes, I ran: pbs_mom
The problem is, when I submit test jobs, only the node1 could send the output file back to the master node. Then other 4 compute nodes could not send the output file back. I ran the command qstat -f and saw following sentences:
sched_hint:Post job file processing error;job32.ciarlab11.cluster.net on host ciarlab14.cluster.net/0
Unable to copy file /var/spool/torque/spool/32.ciarlab11.cluster.net.OU to ciarlab11.cluster.net:/usr/local/out
Unable to copy file /var/spool/torque/spool/32.ciarlab11.cluster.net.ER to ciarlab11.cluster.net:/usr/local/err
comment=Job started on Thu Mar 12 at 21:09
etime=Thu Mar 12 21:09:18 2009
exit_status = -1
start_time=Thu Mar 12 21:09:18 2007
And my job scipt is:
#PBS -N exampleJob
#PBS -o /usr/local/out
#PBS -e /usr/local/err
I have spent 2 days on this issue, and I hope I can get some support from this mailling list.
Any help will be appraciated.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers