[torqueusers] scp error

Chris Samuel csamuel at vpac.org
Thu Nov 30 16:21:43 MST 2006


On Friday 01 December 2006 01:15, LEROY Christine wrote:

> We are using torque and maui beside our grid middleware, and users are
> complaining that there jobs are sometimes failing with no output. We had a
> look in our logs and we can see those errors:
>  
[...]
> Are those  file “/var/spool/pbs/spool/87831.node0.OU” and
> “/var/spool/pbs/spool/87831.node0.ER “ deleted too soon by the system on
> the pbs_mom node? Or is it possible to configure the number of attempts ?

Actually I think you'll find it's a race condition where Globus notices the 
job has finished and deletes the destination directory before the pbs_mom 
gets the chance to copy the files back.

Is this GT 2, 3 or 4 ?

We've seen this happen occasionally for jobs that come in through our Globus 
gatekeepers from the APAC Grid.

The other time I've seen this happen has been when the user doesn't have the 
correct ssh keys configured, but you'd see predictable failures then so my 
guess would be the race condition.

Good luck!
Chris
-- 
 Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager
 Victorian Partnership for Advanced Computing http://www.vpac.org/
 Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://www.supercluster.org/pipermail/torqueusers/attachments/20061201/d6b07d7d/attachment.bin


More information about the torqueusers mailing list