[torqueusers] -l file not working properly? (Torque 2.0.0p5, Maui 3.2.6p14)

garrick at speculation.org garrick at speculation.org
Fri Jun 2 00:47:12 MDT 2006


On Thu, Jun 01, 2006 at 06:51:53PM -0400, garrick at speculation.org alleged:
> On Thu, Jun 01, 2006 at 03:24:12PM -0500, Mike Renfro alleged:
> > I have a very disk-intensive user that bought a 300GB drive to put in 
> > one of our nodes. In an attempt to steer his jobs to the system I 
> > installed the drive into, I'm testing out jobs with the "-l file" 
> > directive. It's not working when I request several GB of disk space, an 
> > far less than the available space reported by checknode. Further testing 
> > shows that the breaking point from jobs running and aborting is between 
> > 3gb and 4gb. Any ideas?
> 
> Looks like this is limited to ULONG_MAX which is about 4 billion on 32bit
> arch (works fine on x86_64).
> 
> I don't see a clear fix here because converting a large chunk of
> that code to using unsigned long long doesn't look quite portable
> (ULLONG_MAX requires c99 mode).
> 
> I wonder if the best thing to do is to ignore the error and let the job
> run.  The "file" resource is being overloaded.  You are using it as a
> resource request to the scheduler, and pbs_mom is using it to set a max
> ulimit file size.  It is the later that is failing and you don't care
> about that for your purposes.

I went through the motions of increasing the variable types in MOM to
use unsigned long long and using setrlimit64() and it turns out that
Linux doesn't even allow limits (at least for file) greater than ULONG_MAX.

So it would seem that ignoring this error for "file" makes sense.



More information about the torqueusers mailing list