[torqueusers] scp -B trouble with torque

Patrick Guio Patrick.Guio at bccs.uib.no
Tue Dec 13 12:50:06 MST 2005


Dear Torque users,

I am installing LCG middleware, currently a CE and a WN. I have installed 
torque version 1.01.p6 packaged the following way:
On the CEhost:
torque-1.0.1p6-11.SL30X.st
torque-clients-1.0.1p6-11.SL30X.st
torque-devel-1.0.1p6-11.SL30X.st
torque-server-1.0.1p6-11.SL30X.st
torque-resmom-1.0.1p6-11.SL30X.st
On the WNhost:
torque-1.0.1p6-11.SL30X.st
torque-clients-1.0.1p6-11.SL30X.st
torque-resmom-1.0.1p6-11.SL30X.st

I have set up several queues and can submit jobs manually as my userid 
as well as jobs run with userid related to grid (dteam001)

Now when I submit a globus job something is not working properly. I looked 
at the mom log on the WNhost (/var/spool/pbs/mom_logs/<date>), I can see 
that the jobs is submitted but something is going wrong:

First I get

pbs_mom;Req;req_reject;Reject reply code=15001, aux=0, type=11, from PBS_Server at grid.local
pbs_mom;Fil;sys_copy;command: /usr/bin/scp -Br

I tracked this down in the maui log on the CEhost (var/log/maui.log)
WARNING:  cannot set job '51.xx.xx.xx.xx' attr 
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')

Does anyone has any idea whether this warning is serious?

But a more serious trouble is with the stage-in of the job which I assume 
is run as the user dteam001. In the mom_logs I can then read:

dteam001 at xx.xx.xx.xx:/home/dteam001/.lcgjm/globus-cache-export.M6NIm3/globus-cache-export.M6NIm3.gpg 
globus-cache-export.M6NIm3.gpg status=1 (copy request failed), try=1
pbs_mom;Fil;sys_copy;command: /usr/sbin/pbs_rcp -r 
dteam001 at xx.xx.xx.xx:/home/dteam001/.lcgjm/globus-cache-export.M6NIm3/globus-cache-export.M6NIm3.gpg 
globus-cache-export.M6NIm3.gpg status=1 (copy request failed), try=2

which is repeated twice.

The message does not say more than "copy request failed".
As I am able to perform such scp as user dteam001 on WNhost and checking 
sshd log on CEhost, it does seem to be a "permission denied" kind of error;
Could it mean that the source file is missing?

If I am logged on the WNhost as user dteam001, I can see that indeed such 
a file is created but could there be a race condition or timing problem so 
that the file isn't accessible when the scp is run?

On both CEhost and WNhost automount is used. In /etc/auto.home there is

dteam001        grid.local:/export/home/dteam001

In addition in /etc/exportfs on the CEhost /export is exported 
(etc/exportfs):
/export 10.0.0.0/255.0.0.0(rw)

In the mom config on the WNhost (/var/spool/pbs/mom_priv/config),
there is currently the $usecp setting:

$usecp *:/home /home

Should there be something different?

Any help/tips to solve the problem or instructions on how to proceed to 
further debug are welcome!

Sincerely,

Patrick






More information about the torqueusers mailing list