[torqueusers] sporadic scp failures
jonah at eecs.berkeley.edu
Wed Feb 17 11:01:13 MST 2010
I'm getting sporadic failures when it tries to copy the results .ER and
.OU files back. It is not 100% of the time, nor is is 100% consistent
on which hosts have problems. Sometimes the same host will succeed for
one or both files and sometimes it will fail for both.
I'm wondering if this might have something to do with too many scp
requests showing up simultaneously and some sort of rate-limiting
happening. Any suggestions on where I might look? What I might tweak?
Is there some way to increase the default socket backlog, or that used
> PBS Job Id: 958.XXX.berkeley.edu
> Job Name: STDIN
> Exec host: s103/11
> An error has occurred processing your job, see below.
> Post job file processing error; job 958.XXX.berkeley.edu on host s103/11
> Unable to copy file /var/spool/torque/spool/958.XXX.berkeley.edu.OU to
> jonah at XXX.berkeley.edu:/home/cs/jonah/STDIN.o958
> *** error from copy
> ssh_exchange_identification: Connection closed by remote host
> lost connection
> *** end error output
> Output retained on that host in:
More information about the torqueusers