[torqueusers] torque tmpdir on Lustre filesystem

Lukasz Flis l.flis at cyf-kr.edu.pl
Mon Feb 27 06:22:30 MST 2012


> HP SFS installed here is on Lustre 1.8.4
>     We have observed the BUG in Lustre 1.8.(4,5.6) infrastructure. Then we
>     moved to 2.1 line replacing all the components (servers,arrays,fabric)
>     and the bug remained.
>     The problem with lustre is that mkdir() call on EXISTING directory
>     returns EPERM error instead of EEXIST once in a while, usually when
>     stat() is called before mkdir.
> The $tmpdir variable is appended with jobid, so it would be a new path
> every time,
> unless the call is in a way similar to command "mkdir -p
> /mnt/lustre/scratch/jobs/<job id>"

mkdir command from core utils is a different case because it is issuing 
stat() call before calling mkdir(). mkdirtree function from torque is 
invoking mkdir() all along the path expecting EEXIST error when 
mkdir()ing existig directories. If stat is not called before mkdir() and 
some time has passed since last access to the directory mkdir() will 
return EPERM instead of EACCESS. Next mkdir() call with same arguments 
will return EACCESS again.

Today one of our users hit the bug  when using QuantumEspresso software.

I'm waiting to see what Whamcould can say about it.

Lukasz Flis

More information about the torqueusers mailing list