[Mauiusers] configuring disk space

Dave Jackson jacksond at clusterresources.com
Mon Feb 5 16:29:43 MST 2007


Kevin,

  Thanks.  I will get ddisk support into Maui this week.

Dave

On Mon, 2007-02-05 at 18:06 -0500, Kevin Van Workum wrote:
> Dave,
> 
> Yes, that is what I want. Thanks.
> 
> Also, if TORQUE is trying to set the file ulimit, then the following
> in pbs_mom.8 is rather misleading and should probably be updated.
> 
>        size[fs=<FS>]
>               Specifies  that  the  available and configured disk space in the
>               <FS> filesystem is to be reported to the pbs_server  and  sched-
>               uler.   NOTE:  To request disk space on a per job basis, specify
>               the file resource as in 'qsub -l nodes=1,file=1000kb'  For exam-
>               ple,   the   available   and   configured   disk  space  in  the
>               /localscratch filesystem will be reported:
> 
>               size[fs=/localscratch]
> 
> Kevin
> 
> On 2/5/07, Dave Jackson <jacksond at clusterresources.com> wrote:
> > Kevin,
> >
> >   The failure in TORQUE is occurring because TORQUE is trying to set the
> > file ulimit.  My guess is that this is not what you want.  You want Maui
> > to schedule diskspace as a consumable resource but not enforce anything
> > at the OS level?  Is this correct?  Moab supports a 'ddisk' (dedicated
> > disk) RM extension which is a disk constraint enforced by the scheduler
> > as a consumable resource but not enforced via ulimits, ie,
> >
> > > qsub -l ddisk=4096mb,walltime=1000 trestjob.cmd
> >
> >   If this is what you want, I think we could roll that into Maui without
> > any trouble.
> >
> > Dave
> >
> >
> >
> > On Mon, 2007-02-05 at 15:33 -0700, Dave Jackson wrote:
> > > Kevin,
> > >
> > >   I would guess you are requesting this correctly with qsub.  I am not
> > > certain why TORQUE is rejecting this but it looks like Maui is doing the
> > > right thing.  (Also, the mom 'size' option was correct, my fingers went
> > > on auto-pilot in writing the previous email)
> > >
> > >   When I test this under TORQUE 2.2.0, my job fails with the following
> > > in the job's stderr file
> > >
> > > --------
> > > getsize() failed for file in mom_set_limits
> > > --------
> > >
> > >   It looks like we need to move this discussion over to torqueusers.  Do
> > > you want to re-post this?  I will get started looking at what we can do
> > > to make getsize working on larger values.
> > >
> > > Dave
> > >
> > >
> > > On Mon, 2007-02-05 at 17:17 -0500, Kevin Van Workum wrote:
> > > > I tried using:
> > > > NODECFG[amd24] CFGDISK=10000
> > > >
> > > > checknode amd24 returned:
> > > > Configured Resources: PROCS: 1  MEM: 503M  SWAP: 1484M  DISK: 10000M
> > > > Utilized   Resources: DISK: 9999M
> > > > Dedicated  Resources: [NONE]
> > > >
> > > > But my job doesn't start if I request more than 1mb, e.g. 'qsub -l
> > > > file=2mb'. So it looks like maui thinks there is only 1mb of disk
> > > > available.
> > > >
> > > > Also, I couldn't find any 'file' option for mom's config, but I did
> > > > find the 'size' option. So I tried 'size[fs=/tmp]'.
> > > >
> > > > In this case checknode amd24 returned:
> > > > Configured Resources: PROCS: 1  MEM: 503M  SWAP: 1483M  DISK: 12G
> > > > Utilized   Resources: DISK: 1855M
> > > > Dedicated  Resources: [NONE]
> > > >
> > > > Which is close to the correct disk space for /tmp. 'df -h /tmp' gives:
> > > > [root at amd24 ~]# df /tmp -h
> > > > Filesystem                                    Size  Used Avail Use% Mounted on
> > > > /dev/mapper/VolGroup00-LogVol00 13G  1.9G 10G  16%    /
> > > >
> > > > But I found that if I request file=4096mb or more, the job fails.
> > > > mom's log says:
> > > > "pbs_mom;Job;TMomFinalizeJob3;job not started, Failure job exec
> > > > failure, after files staged, no retry"
> > > > and pbs_server says:
> > > > "PBS_Server: stream_eof, connection to amd24 is bad, remote service
> > > > may be down, message may be corrupt, or connection may have been
> > > > dropped remotely (End of File).  setting node state to down"
> > > >
> > > > Requesting less than file=4096mb works fine.
> > > >
> > > > Maybe I'm requesting the required disk space incorrectly on qsub's command line.
> > > >
> > > > Kevin
> > > >
> > > > On 2/5/07, Dave Jackson <jacksond at clusterresources.com> wrote:
> > > > > Kevin,
> > > > >
> > > > >   I think you can indicate the amount of diskspace available using the
> > > > > TORQUE 'file' option in mom config.  Otherwise, you should be able to
> > > > > populate this info directly via Maui using 'NODECFG[X] CFGDISK=<VAL>'
> > > > > where val is specified in MB.  If this does not work, let me know and we
> > > > > will get it fixed.
> > > > >
> > > > > Dave
> > > > >
> > > > > On Mon, 2007-02-05 at 16:16 -0500, Kevin Van Workum wrote:
> > > > > > My nodes have various amounts of local scratch disk space. How do I
> > > > > > tell maui how much local scratch disk space each node has, and how do
> > > > > > I request a certain amount of disk space on each node for a particular
> > > > > > job?
> > > > > >
> > > > > > I'm testing with maui-3.2.6p19 and torque-2.1.6.
> > > > > >
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > _______________________________________________
> > > mauiusers mailing list
> > > mauiusers at supercluster.org
> > > http://www.supercluster.org/mailman/listinfo/mauiusers
> >
> >
> 
> 



More information about the mauiusers mailing list