[torqueusers] Job files not copied back at end of job

Philip Peartree P.Peartree at postgrad.manchester.ac.uk
Thu Oct 2 08:15:25 MDT 2008


I found my error, I'd mis-spelled authori_z_ed, sometimes it's a pain  
being English!!

Quoting "Daniel Bourque" <dbourque at weatherdata.com>:

> scp should run under the job owner privileges, mcdiypp2 is this case
> so as long as you have the proper ssh keys in place, you should be fine.
>
> is /home/mcdiypp2 not available to all nodes via NFS ? or does
> /home/mcdiypp2 exists on every nodes a different copy ?
>
>
> Daniel Bourque
> Sr. Systems Engineer
> WeatherData Service Inc
> An Accuweather Company
>
>
>
> Philip Peartree wrote:
>> I only have a short hostname in the known_hosts, but the nodes   
>> themselves only have a short hostname specified in the /etc/hosts   
>> file. I tried adding the fqdn for the headnode (where it is copying  
>>  to) into ssh_known_hosts, but I get the same error when scping   
>> (permission denied). I tried running scp without -B switch and it   
>> asks for my password, which makes me think that the ssh is not   
>> passwordless (it is for root) Any ideas?
>>
>> Phil
>>
>>
>> Quoting "Steve Young" <chemadm at hamilton.edu>:
>>
>>> Hi,
>>>    do a tracejob and find out what node this job went to. Then try
>>> ssh'ing into that node and try doing a copy just like it showed you in
>>> the error. You mentioned you have the IP and hostname in known_hosts.
>>> Is this the fqdn or short hostname? Looks like here it is trying to use
>>> short hostname. I'd make sure you have both short and long hostnames.
>>> Hope this helps,
>>>
>>> -Steve
>>>
>>> On Sep 30, 2008, at 9:15 AM, Philip Peartree wrote:
>>>
>>>> Hi,
>>>>
>>>> My pbs output and error files are not being copied back at the   
>>>> end  of jobs, I have the error:
>>>>
>>>> post job file processing error
>>>>
>>>> in my pbs log files, and in the syslogs on the appropriate nodes   
>>>> I  get the following errors:
>>>>
>>>> Sep 30 14:04:05 node18 pbs_mom: sys_copy, command '/usr/bin/scp    
>>>> -rpB /var/spool/torque/spool/42.steel.mib.man.ac.uk.OU    
>>>> mcdiypp2 at steel:/home/mcdiypp2/output' failed with status=1,   
>>>> giving  up after 4 attempts
>>>> Sep 30 14:04:05 node18 pbs_mom: req_cpyfile, Unable to copy file   
>>>>  /var/spool/torque/spool/42.steel.mib.man.ac.uk.OU to    
>>>> mcdiypp2 at steel:/home/mcdiypp2/output
>>>> Sep 30 14:04:09 node18 pbs_mom: sys_copy, command '/usr/bin/scp    
>>>> -rpB /var/spool/torque/spool/42.steel.mib.man.ac.uk.ER    
>>>> mcdiypp2 at steel:/home/mcdiypp2/error' failed with status=1, giving  
>>>>   up after 4 attempts
>>>> Sep 30 14:04:09 node18 pbs_mom: req_cpyfile, Unable to copy file   
>>>>  /var/spool/torque/spool/42.steel.mib.man.ac.uk.ER to    
>>>> mcdiypp2 at steel:/home/mcdiypp2/error
>>>>
>>>>
>>>> I have checked the ssh_known_hosts as this has been noted in    
>>>> another posting, and I have both the IP and hostname in there.
>>>>
>>>> Could anyone shed any light?
>>>>
>>>>
>>>> Phil Peartree
>>>> University of Manchester
>>>>
>>>> _______________________________________________
>>>> torqueusers mailing list
>>>> torqueusers at supercluster.org
>>>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>>
>>>
>>
>>
>> _______________________________________________
>> torqueusers mailing list
>> torqueusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/torqueusers
>>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>




More information about the torqueusers mailing list