[torqueusers] -l file not working properly? (Torque 2.0.0p5, Maui 3.2.6p14)

garrick at speculation.org garrick at speculation.org
Thu Jun 1 18:09:02 MDT 2006


On Thu, Jun 01, 2006 at 06:51:53PM -0400, garrick at speculation.org alleged:
> On Thu, Jun 01, 2006 at 03:24:12PM -0500, Mike Renfro alleged:
> > I have a very disk-intensive user that bought a 300GB drive to put in 
> > one of our nodes. In an attempt to steer his jobs to the system I 
> > installed the drive into, I'm testing out jobs with the "-l file" 
> > directive. It's not working when I request several GB of disk space, an 
> > far less than the available space reported by checknode. Further testing 
> > shows that the breaking point from jobs running and aborting is between 
> > 3gb and 4gb. Any ideas?
> 
> Looks like this is limited to ULONG_MAX which is about 4 billion on 32bit
> arch (works fine on x86_64).
> 
> I don't see a clear fix here because converting a large chunk of
> that code to using unsigned long long doesn't look quite portable
> (ULLONG_MAX requires c99 mode).
> 
> I wonder if the best thing to do is to ignore the error and let the job
> run.  The "file" resource is being overloaded.  You are using it as a
> resource request to the scheduler, and pbs_mom is using it to set a max
> ulimit file size.  It is the later that is failing and you don't care
> about that for your purposes.

The attached patch causes MOM to ignore errors with large file resource.
It should do what you actually want until we figure out a real solution.

-------------- next part --------------
Index: src/resmom/linux/mom_mach.c
===================================================================
RCS file: /usr/local/nfs/src/cvs_repository/torque/src/resmom/linux/mom_mach.c,v
retrieving revision 1.54
diff -u -r1.54 mom_mach.c
--- src/resmom/linux/mom_mach.c	2 May 2006 02:13:03 -0000	1.54
+++ src/resmom/linux/mom_mach.c	2 Jun 2006 00:07:25 -0000
@@ -1182,10 +1182,8 @@
         {
         retval = getsize(pres,&value);
 
-        if (retval != PBSE_NONE)
+        if (retval == PBSE_NONE)
           {
-          return(error(pname,retval));
-          }
 
         if (value > ULONG_MAX)
           {
@@ -1217,6 +1215,7 @@
 
           return(error(pname,PBSE_SYSTEM));
           }
+          } /* (retval == PBSE_NONE) - poor indenting to keep patch size small */
         }
       } 
     else if (!strcmp(pname,"vmem")) 


More information about the torqueusers mailing list