[torqueusers] scp -B trouble with torque
Patrick Guio
Patrick.Guio at bccs.uib.no
Tue Dec 13 12:50:06 MST 2005
Dear Torque users,
I am installing LCG middleware, currently a CE and a WN. I have installed
torque version 1.01.p6 packaged the following way:
On the CEhost:
torque-1.0.1p6-11.SL30X.st
torque-clients-1.0.1p6-11.SL30X.st
torque-devel-1.0.1p6-11.SL30X.st
torque-server-1.0.1p6-11.SL30X.st
torque-resmom-1.0.1p6-11.SL30X.st
On the WNhost:
torque-1.0.1p6-11.SL30X.st
torque-clients-1.0.1p6-11.SL30X.st
torque-resmom-1.0.1p6-11.SL30X.st
I have set up several queues and can submit jobs manually as my userid
as well as jobs run with userid related to grid (dteam001)
Now when I submit a globus job something is not working properly. I looked
at the mom log on the WNhost (/var/spool/pbs/mom_logs/<date>), I can see
that the jobs is submitted but something is going wrong:
First I get
pbs_mom;Req;req_reject;Reject reply code=15001, aux=0, type=11, from PBS_Server at grid.local
pbs_mom;Fil;sys_copy;command: /usr/bin/scp -Br
I tracked this down in the maui log on the CEhost (var/log/maui.log)
WARNING: cannot set job '51.xx.xx.xx.xx' attr
'Resource_List:neednodes' to '1' (rc: 15001 'Unknown Job Id')
Does anyone has any idea whether this warning is serious?
But a more serious trouble is with the stage-in of the job which I assume
is run as the user dteam001. In the mom_logs I can then read:
dteam001 at xx.xx.xx.xx:/home/dteam001/.lcgjm/globus-cache-export.M6NIm3/globus-cache-export.M6NIm3.gpg
globus-cache-export.M6NIm3.gpg status=1 (copy request failed), try=1
pbs_mom;Fil;sys_copy;command: /usr/sbin/pbs_rcp -r
dteam001 at xx.xx.xx.xx:/home/dteam001/.lcgjm/globus-cache-export.M6NIm3/globus-cache-export.M6NIm3.gpg
globus-cache-export.M6NIm3.gpg status=1 (copy request failed), try=2
which is repeated twice.
The message does not say more than "copy request failed".
As I am able to perform such scp as user dteam001 on WNhost and checking
sshd log on CEhost, it does seem to be a "permission denied" kind of error;
Could it mean that the source file is missing?
If I am logged on the WNhost as user dteam001, I can see that indeed such
a file is created but could there be a race condition or timing problem so
that the file isn't accessible when the scp is run?
On both CEhost and WNhost automount is used. In /etc/auto.home there is
dteam001 grid.local:/export/home/dteam001
In addition in /etc/exportfs on the CEhost /export is exported
(etc/exportfs):
/export 10.0.0.0/255.0.0.0(rw)
In the mom config on the WNhost (/var/spool/pbs/mom_priv/config),
there is currently the $usecp setting:
$usecp *:/home /home
Should there be something different?
Any help/tips to solve the problem or instructions on how to proceed to
further debug are welcome!
Sincerely,
Patrick
More information about the torqueusers
mailing list