[torqueusers] pbs_mom: req_cpyfile, Unable to copy file

Arnau Bria arnaubria at pic.es
Tue Nov 6 04:51:54 MST 2007


Hi,

we're getting sporadic errors when jobs finishes running in a WN and
has to copy its output to submitter host. 

We've configured ssh in our submitter/executer in order to avoid
requesting password, so for example:

[root at td237 ~]# su - ops006
[ops006 at td237 ~]$ ssh ce07 date
Scientific Linux CERN Release 3.0.8 (SL)
Tue Nov  6 12:09:05 CET 2007
[ops006 at td237 ~]$

But looking job's log in WN we find:
Oct 24 02:20:48 td237 pbs_mom: req_cpyfile, Unable to copy file
ops006 at ce07.pic.es:/home/ops006/.lcgjm/globus-cache-export.Q18475/globus-cache-e
xport.Q18475.gpg to globus-cache-export.Q18475.gpg

and in pbs server:
[root at pbs01 root]# grep
3145425 /var/spool/pbs/server_priv/accounting/200710* /var/spool/pbs/server_priv/accounting/20071024:10/24/2007
02:18:46;Q;3145425.pbs01.pic.es;queue=gshort
/var/spool/pbs/server_priv/accounting/20071024:10/24/2007
02:21:44;D;3145425.pbs01.pic.es;requestor=ops006 at ce07.pic.es


finally, maui's log:

[root at pbs01 root]# grep 3145425 /var/log/maui.log*
/var/log/maui.log.1:10/24 02:20:44 INFO:     job '3145425' loaded:   1   ops006 
    ops  86400       Idle   0 1193185126   [NONE] [NONE] [NONE] >=      0 >=    
 0 [slc4] 1193185244
/var/log/maui.log.1:10/24 02:20:44 MRMJobStart(3145425,Msg,SC)
/var/log/maui.log.1:10/24 02:20:44 MPBSJobStart(3145425,base,Msg,SC)
/var/log/maui.log.1:10/24 02:20:44
MPBSJobModify(3145425,Resource_List,Resource,td237.pic.es)
/var/log/maui.log.1:10/24 02:20:44
MPBSJobModify(3145425,Resource_List,Resource,1)
/var/log/maui.log.1:10/24 02:20:44 WARNING:  cannot set job
'3145425.pbs01.pic.es' attr 'Resource_List:neednodes' to '1' (rc: 15001 'Unknown
Job Id')
/var/log/maui.log.1:10/24 02:20:44 INFO:     job '3145425' successfully started
/var/log/maui.log.1:10/24 02:22:45 INFO:     active PBS job 3145425 has been
removed from the queue.  assuming successful completion


AS I commented at the beginnig of the mail, errors are sporadic, but we
find lots certain days, i certain WN. All wn share conf, so no
difference between them a part of the job that are running.

Versions:
in WN:
[root at td237 ~]# rpm -qa|grep torque
torque-devel-2.1.8-1cri_sl4_1st.i386
torque-mom-2.1.8-1cri_sl4_1st.i386
torque-2.1.8-1cri_sl4_1st.i386
torque-client-2.1.8-1cri_sl4_1st.i386
torque-docs-2.1.8-1cri_sl4_1st.i386

in server:
[root at pbs01 root]# rpm -qa|grep torque
torque-gui-2.1.8-1cri_sl3_1st
torque-client-2.1.8-1cri_sl3_1st
torque-server-2.1.8-1cri_sl3_1st
torque-2.1.8-1cri_sl3_1st

TIA,
Arnau


More information about the torqueusers mailing list