[torqueusers] Job files not copied back at end of job

Philip Peartree P.Peartree at postgrad.manchester.ac.uk
Thu Oct 2 07:02:20 MDT 2008


I only have a short hostname in the known_hosts, but the nodes  
themselves only have a short hostname specified in the /etc/hosts  
file. I tried adding the fqdn for the headnode (where it is copying  
to) into ssh_known_hosts, but I get the same error when scping  
(permission denied). I tried running scp without -B switch and it asks  
for my password, which makes me think that the ssh is not passwordless  
(it is for root) Any ideas?

Phil


Quoting "Steve Young" <chemadm at hamilton.edu>:

> Hi,
> 	do a tracejob and find out what node this job went to. Then try
> ssh'ing into that node and try doing a copy just like it showed you in
> the error. You mentioned you have the IP and hostname in known_hosts.
> Is this the fqdn or short hostname? Looks like here it is trying to use
> short hostname. I'd make sure you have both short and long hostnames.
> Hope this helps,
>
> -Steve
>
> On Sep 30, 2008, at 9:15 AM, Philip Peartree wrote:
>
>> Hi,
>>
>> My pbs output and error files are not being copied back at the end   
>> of jobs, I have the error:
>>
>> post job file processing error
>>
>> in my pbs log files, and in the syslogs on the appropriate nodes I   
>> get the following errors:
>>
>> Sep 30 14:04:05 node18 pbs_mom: sys_copy, command '/usr/bin/scp   
>> -rpB /var/spool/torque/spool/42.steel.mib.man.ac.uk.OU   
>> mcdiypp2 at steel:/home/mcdiypp2/output' failed with status=1, giving   
>> up after 4 attempts
>> Sep 30 14:04:05 node18 pbs_mom: req_cpyfile, Unable to copy file   
>> /var/spool/torque/spool/42.steel.mib.man.ac.uk.OU to   
>> mcdiypp2 at steel:/home/mcdiypp2/output
>> Sep 30 14:04:09 node18 pbs_mom: sys_copy, command '/usr/bin/scp   
>> -rpB /var/spool/torque/spool/42.steel.mib.man.ac.uk.ER   
>> mcdiypp2 at steel:/home/mcdiypp2/error' failed with status=1, giving   
>> up after 4 attempts
>> Sep 30 14:04:09 node18 pbs_mom: req_cpyfile, Unable to copy file   
>> /var/spool/torque/spool/42.steel.mib.man.ac.uk.ER to   
>> mcdiypp2 at steel:/home/mcdiypp2/error
>>
>>
>> I have checked the ssh_known_hosts as this has been noted in   
>> another posting, and I have both the IP and hostname in there.
>>
>> Could anyone shed any light?
>>
>>
>> Phil Peartree
>> University of Manchester
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>
>




More information about the torqueusers mailing list