[torqueusers] scp error
csamuel at vpac.org
Thu Nov 30 16:21:43 MST 2006
On Friday 01 December 2006 01:15, LEROY Christine wrote:
> We are using torque and maui beside our grid middleware, and users are
> complaining that there jobs are sometimes failing with no output. We had a
> look in our logs and we can see those errors:
> Are those file “/var/spool/pbs/spool/87831.node0.OU” and
> “/var/spool/pbs/spool/87831.node0.ER “ deleted too soon by the system on
> the pbs_mom node? Or is it possible to configure the number of attempts ?
Actually I think you'll find it's a race condition where Globus notices the
job has finished and deletes the destination directory before the pbs_mom
gets the chance to copy the files back.
Is this GT 2, 3 or 4 ?
We've seen this happen occasionally for jobs that come in through our Globus
gatekeepers from the APAC Grid.
The other time I've seen this happen has been when the user doesn't have the
correct ssh keys configured, but you'd see predictable failures then so my
guess would be the race condition.
Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
Victorian Partnership for Advanced Computing http://www.vpac.org/
Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20061201/d6b07d7d/attachment.bin
More information about the torqueusers