[torqueusers] pbs_mom: req_cpyfile, Unable to copy file
Arnau Bria
arnaubria at pic.es
Tue Nov 6 04:51:54 MST 2007
Hi,
we're getting sporadic errors when jobs finishes running in a WN and
has to copy its output to submitter host.
We've configured ssh in our submitter/executer in order to avoid
requesting password, so for example:
[root at td237 ~]# su - ops006
[ops006 at td237 ~]$ ssh ce07 date
Scientific Linux CERN Release 3.0.8 (SL)
Tue Nov 6 12:09:05 CET 2007
[ops006 at td237 ~]$
But looking job's log in WN we find:
Oct 24 02:20:48 td237 pbs_mom: req_cpyfile, Unable to copy file
ops006 at ce07.pic.es:/home/ops006/.lcgjm/globus-cache-export.Q18475/globus-cache-e
xport.Q18475.gpg to globus-cache-export.Q18475.gpg
and in pbs server:
[root at pbs01 root]# grep
3145425 /var/spool/pbs/server_priv/accounting/200710* /var/spool/pbs/server_priv/accounting/20071024:10/24/2007
02:18:46;Q;3145425.pbs01.pic.es;queue=gshort
/var/spool/pbs/server_priv/accounting/20071024:10/24/2007
02:21:44;D;3145425.pbs01.pic.es;requestor=ops006 at ce07.pic.es
finally, maui's log:
[root at pbs01 root]# grep 3145425 /var/log/maui.log*
/var/log/maui.log.1:10/24 02:20:44 INFO: job '3145425' loaded: 1 ops006
ops 86400 Idle 0 1193185126 [NONE] [NONE] [NONE] >= 0 >=
0 [slc4] 1193185244
/var/log/maui.log.1:10/24 02:20:44 MRMJobStart(3145425,Msg,SC)
/var/log/maui.log.1:10/24 02:20:44 MPBSJobStart(3145425,base,Msg,SC)
/var/log/maui.log.1:10/24 02:20:44
MPBSJobModify(3145425,Resource_List,Resource,td237.pic.es)
/var/log/maui.log.1:10/24 02:20:44
MPBSJobModify(3145425,Resource_List,Resource,1)
/var/log/maui.log.1:10/24 02:20:44 WARNING: cannot set job
'3145425.pbs01.pic.es' attr 'Resource_List:neednodes' to '1' (rc: 15001 'Unknown
Job Id')
/var/log/maui.log.1:10/24 02:20:44 INFO: job '3145425' successfully started
/var/log/maui.log.1:10/24 02:22:45 INFO: active PBS job 3145425 has been
removed from the queue. assuming successful completion
AS I commented at the beginnig of the mail, errors are sporadic, but we
find lots certain days, i certain WN. All wn share conf, so no
difference between them a part of the job that are running.
Versions:
in WN:
[root at td237 ~]# rpm -qa|grep torque
torque-devel-2.1.8-1cri_sl4_1st.i386
torque-mom-2.1.8-1cri_sl4_1st.i386
torque-2.1.8-1cri_sl4_1st.i386
torque-client-2.1.8-1cri_sl4_1st.i386
torque-docs-2.1.8-1cri_sl4_1st.i386
in server:
[root at pbs01 root]# rpm -qa|grep torque
torque-gui-2.1.8-1cri_sl3_1st
torque-client-2.1.8-1cri_sl3_1st
torque-server-2.1.8-1cri_sl3_1st
torque-2.1.8-1cri_sl3_1st
TIA,
Arnau
More information about the torqueusers
mailing list