[torqueusers] torque tmpdir on Lustre filesystem
l.flis at cyf-kr.edu.pl
Mon Feb 27 06:22:30 MST 2012
> HP SFS installed here is on Lustre 1.8.4
> We have observed the BUG in Lustre 1.8.(4,5.6) infrastructure. Then we
> moved to 2.1 line replacing all the components (servers,arrays,fabric)
> and the bug remained.
> The problem with lustre is that mkdir() call on EXISTING directory
> returns EPERM error instead of EEXIST once in a while, usually when
> stat() is called before mkdir.
> The $tmpdir variable is appended with jobid, so it would be a new path
> every time,
> unless the call is in a way similar to command "mkdir -p
> /mnt/lustre/scratch/jobs/<job id>"
mkdir command from core utils is a different case because it is issuing
stat() call before calling mkdir(). mkdirtree function from torque is
invoking mkdir() all along the path expecting EEXIST error when
mkdir()ing existig directories. If stat is not called before mkdir() and
some time has passed since last access to the directory mkdir() will
return EPERM instead of EACCESS. Next mkdir() call with same arguments
will return EACCESS again.
Today one of our users hit the bug when using QuantumEspresso software.
I'm waiting to see what Whamcould can say about it.
More information about the torqueusers