[torqueusers] Job files not copied back at end of job
dbourque at weatherdata.com
Thu Oct 2 08:08:04 MDT 2008
scp should run under the job owner privileges, mcdiypp2 is this case
so as long as you have the proper ssh keys in place, you should be fine.
is /home/mcdiypp2 not available to all nodes via NFS ? or does
/home/mcdiypp2 exists on every nodes a different copy ?
Sr. Systems Engineer
WeatherData Service Inc
An Accuweather Company
Philip Peartree wrote:
> I only have a short hostname in the known_hosts, but the nodes
> themselves only have a short hostname specified in the /etc/hosts
> file. I tried adding the fqdn for the headnode (where it is copying
> to) into ssh_known_hosts, but I get the same error when scping
> (permission denied). I tried running scp without -B switch and it asks
> for my password, which makes me think that the ssh is not passwordless
> (it is for root) Any ideas?
> Quoting "Steve Young" <chemadm at hamilton.edu>:
>> do a tracejob and find out what node this job went to. Then try
>> ssh'ing into that node and try doing a copy just like it showed you in
>> the error. You mentioned you have the IP and hostname in known_hosts.
>> Is this the fqdn or short hostname? Looks like here it is trying to use
>> short hostname. I'd make sure you have both short and long hostnames.
>> Hope this helps,
>> On Sep 30, 2008, at 9:15 AM, Philip Peartree wrote:
>>> My pbs output and error files are not being copied back at the end
>>> of jobs, I have the error:
>>> post job file processing error
>>> in my pbs log files, and in the syslogs on the appropriate nodes I
>>> get the following errors:
>>> Sep 30 14:04:05 node18 pbs_mom: sys_copy, command '/usr/bin/scp
>>> -rpB /var/spool/torque/spool/42.steel.mib.man.ac.uk.OU
>>> mcdiypp2 at steel:/home/mcdiypp2/output' failed with status=1, giving
>>> up after 4 attempts
>>> Sep 30 14:04:05 node18 pbs_mom: req_cpyfile, Unable to copy file
>>> /var/spool/torque/spool/42.steel.mib.man.ac.uk.OU to
>>> mcdiypp2 at steel:/home/mcdiypp2/output
>>> Sep 30 14:04:09 node18 pbs_mom: sys_copy, command '/usr/bin/scp
>>> -rpB /var/spool/torque/spool/42.steel.mib.man.ac.uk.ER
>>> mcdiypp2 at steel:/home/mcdiypp2/error' failed with status=1, giving
>>> up after 4 attempts
>>> Sep 30 14:04:09 node18 pbs_mom: req_cpyfile, Unable to copy file
>>> /var/spool/torque/spool/42.steel.mib.man.ac.uk.ER to
>>> mcdiypp2 at steel:/home/mcdiypp2/error
>>> I have checked the ssh_known_hosts as this has been noted in
>>> another posting, and I have both the IP and hostname in there.
>>> Could anyone shed any light?
>>> Phil Peartree
>>> University of Manchester
>>> torqueusers mailing list
>>> torqueusers at supercluster.org
> torqueusers mailing list
> torqueusers at supercluster.org
More information about the torqueusers