[torqueusers] torque tmpdir on Lustre filesystem

rishi pathak mailmaverick666 at gmail.com
Mon Feb 27 05:40:13 MST 2012


Hello Lukasz,

On Fri, Feb 24, 2012 at 3:54 PM, Lukasz Flis <l.flis at cyf-kr.edu.pl> wrote:

> Hello Christopher, Hi *
>
> >
> > We don't use Lustre (we have Panasas and GPFS), but just wondering
> > does this happen all the time, or only occasionally ?
>
> It happens occasionaly. But as I said - this seems like bug in Lustre
> FS, and it has nothing to do with torque code. Torque is using unlucky
> sequence of stat/mkdir functions which exposes lustre misbehaviour.
>
> > If occasionaly then if the job fails once, will it always fail, or
> > will it work if you try again?
>
> Another call to the mkdirtree() function should succeed after few
> seconds of sleep.
>
> I belive this behaviour in Lustre client appeared in 1.8.x line and
> remains in 2.1.X. HP SFS IIRC is based on 1.4 and 1.6 so it's not affected.
>
HP SFS installed here is on Lustre 1.8.4

>
> We have observed the BUG in Lustre 1.8.(4,5.6) infrastructure. Then we
> moved to 2.1 line replacing all the components (servers,arrays,fabric)
> and the bug remained.
>
> The problem with lustre is that mkdir() call on EXISTING directory
> returns EPERM error instead of EEXIST once in a while, usually when
> stat() is called before mkdir.
>

The $tmpdir variable is appended with jobid, so it would be a new path
every time,
unless the call is in a way similar to command "mkdir -p
/mnt/lustre/scratch/jobs/<job id>"


> I belive doing mkdir on a existing path is not very common practice and
> that's the reason the BUG was unnoticed for a long time
>
> Cheers,
> --
> Lukasz Flis
>
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/torqueusers
>



-- 
---
Rishi Pathak
National PARAM Supercomputing Facility
C-DAC, Pune, India
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/torqueusers/attachments/20120227/6310c02b/attachment-0001.html 


More information about the torqueusers mailing list