[torqueusers] torque tmpdir on Lustre filesystem
mailmaverick666 at gmail.com
Mon Feb 27 05:40:13 MST 2012
On Fri, Feb 24, 2012 at 3:54 PM, Lukasz Flis <l.flis at cyf-kr.edu.pl> wrote:
> Hello Christopher, Hi *
> > We don't use Lustre (we have Panasas and GPFS), but just wondering
> > does this happen all the time, or only occasionally ?
> It happens occasionaly. But as I said - this seems like bug in Lustre
> FS, and it has nothing to do with torque code. Torque is using unlucky
> sequence of stat/mkdir functions which exposes lustre misbehaviour.
> > If occasionaly then if the job fails once, will it always fail, or
> > will it work if you try again?
> Another call to the mkdirtree() function should succeed after few
> seconds of sleep.
> I belive this behaviour in Lustre client appeared in 1.8.x line and
> remains in 2.1.X. HP SFS IIRC is based on 1.4 and 1.6 so it's not affected.
HP SFS installed here is on Lustre 1.8.4
> We have observed the BUG in Lustre 1.8.(4,5.6) infrastructure. Then we
> moved to 2.1 line replacing all the components (servers,arrays,fabric)
> and the bug remained.
> The problem with lustre is that mkdir() call on EXISTING directory
> returns EPERM error instead of EEXIST once in a while, usually when
> stat() is called before mkdir.
The $tmpdir variable is appended with jobid, so it would be a new path
unless the call is in a way similar to command "mkdir -p
> I belive doing mkdir on a existing path is not very common practice and
> that's the reason the BUG was unnoticed for a long time
> Lukasz Flis
> torqueusers mailing list
> torqueusers at supercluster.org
National PARAM Supercomputing Facility
C-DAC, Pune, India
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the torqueusers