[Mauiusers] pbs_mom: req_cpyfile, Unable to copy file

Valery Mitsyn vvm at mammoth.jinr.ru
Thu Nov 8 09:07:54 MST 2007


Hola Arnau,

this can be a result of bunch of simultaneous connection
from WNs to CE. Check on the CE "MaxStartups" in /etc/sshd_config
and try to increase it to 100, the default is 50 wich can be
too low in some situations.

On Thu, 8 Nov 2007, Arnau Bria wrote:

> Hi,
>
>
> a couple of days I sent this e-mail to torque list. I got no reply, so
> I decided to post here too, maybe someone has seen this error before.
>
> Sorry in advance for the cross-posting.
>
>
> we're getting sporadic errors when jobs finishes running in a WN and
> has to copy its output to submitter host.
>
> We've configured ssh in our submitter/executer in order to avoid
> requesting password, so for example:
>
> [root at td237 ~]# su - ops006
> [ops006 at td237 ~]$ ssh ce07 date
> Scientific Linux CERN Release 3.0.8 (SL)
> Tue Nov  6 12:09:05 CET 2007
> [ops006 at td237 ~]$
>
> But looking job's log in WN we find:
> Oct 24 02:20:48 td237 pbs_mom: req_cpyfile, Unable to copy file
> ops006 at ce07.pic.es:/home/ops006/.lcgjm/globus-cache-export.Q18475/globus-cache-e
> xport.Q18475.gpg to globus-cache-export.Q18475.gpg
>
> and in pbs server:
> [root at pbs01 root]# grep
> 3145425 /var/spool/pbs/server_priv/accounting/200710* /var/spool/pbs/server_priv/accounting/20071024:10/24/2007
> 02:18:46;Q;3145425.pbs01.pic.es;queue=gshort
> /var/spool/pbs/server_priv/accounting/20071024:10/24/2007
> 02:21:44;D;3145425.pbs01.pic.es;requestor=ops006 at ce07.pic.es
>
>
> finally, maui's log:
>
> [root at pbs01 root]# grep 3145425 /var/log/maui.log*
> /var/log/maui.log.1:10/24 02:20:44 INFO:     job '3145425' loaded:
> 1   ops006 ops  86400       Idle   0 1193185126   [NONE] [NONE] [NONE]
>> =      0 >= 0 [slc4] 1193185244
> /var/log/maui.log.1:10/24 02:20:44 MRMJobStart(3145425,Msg,SC)
> /var/log/maui.log.1:10/24 02:20:44 MPBSJobStart(3145425,base,Msg,SC)
> /var/log/maui.log.1:10/24 02:20:44
> MPBSJobModify(3145425,Resource_List,Resource,td237.pic.es)
> /var/log/maui.log.1:10/24 02:20:44
> MPBSJobModify(3145425,Resource_List,Resource,1)
> /var/log/maui.log.1:10/24 02:20:44 WARNING:  cannot set job
> '3145425.pbs01.pic.es' attr 'Resource_List:neednodes' to '1' (rc: 15001
> 'Unknown Job Id')
> /var/log/maui.log.1:10/24 02:20:44 INFO:     job '3145425' successfully
> started /var/log/maui.log.1:10/24 02:22:45 INFO:     active PBS job
> 3145425 has been removed from the queue.  assuming successful completion
>
>
> AS I commented at the beginnig of the mail, errors are sporadic, but we
> find lots certain days, i certain WN. All wn share conf, so no
> difference between them a part of the job that are running.
>
> Versions:
> in WN:
> [root at td237 ~]# rpm -qa|grep torque
> torque-devel-2.1.8-1cri_sl4_1st.i386
> torque-mom-2.1.8-1cri_sl4_1st.i386
> torque-2.1.8-1cri_sl4_1st.i386
> torque-client-2.1.8-1cri_sl4_1st.i386
> torque-docs-2.1.8-1cri_sl4_1st.i386
>
> in server:
> [root at pbs01 root]# rpm -qa|grep torque
> torque-gui-2.1.8-1cri_sl3_1st
> torque-client-2.1.8-1cri_sl3_1st
> torque-server-2.1.8-1cri_sl3_1st
> torque-2.1.8-1cri_sl3_1st
>
> TIA,
> Arnau
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>

-- 
Best regards,
  Valery Mitsyn


More information about the mauiusers mailing list