[Mauiusers] configuring disk space

Dave Jackson jacksond at clusterresources.com
Mon Feb 5 15:43:59 MST 2007


Kevin,

  The failure in TORQUE is occurring because TORQUE is trying to set the
file ulimit.  My guess is that this is not what you want.  You want Maui
to schedule diskspace as a consumable resource but not enforce anything
at the OS level?  Is this correct?  Moab supports a 'ddisk' (dedicated
disk) RM extension which is a disk constraint enforced by the scheduler
as a consumable resource but not enforced via ulimits, ie,

> qsub -l ddisk=4096mb,walltime=1000 trestjob.cmd

  If this is what you want, I think we could roll that into Maui without
any trouble.

Dave



On Mon, 2007-02-05 at 15:33 -0700, Dave Jackson wrote:
> Kevin,
> 
>   I would guess you are requesting this correctly with qsub.  I am not
> certain why TORQUE is rejecting this but it looks like Maui is doing the
> right thing.  (Also, the mom 'size' option was correct, my fingers went
> on auto-pilot in writing the previous email)
> 
>   When I test this under TORQUE 2.2.0, my job fails with the following
> in the job's stderr file
> 
> --------
> getsize() failed for file in mom_set_limits
> --------
> 
>   It looks like we need to move this discussion over to torqueusers.  Do
> you want to re-post this?  I will get started looking at what we can do
> to make getsize working on larger values.
> 
> Dave
> 
>   
> On Mon, 2007-02-05 at 17:17 -0500, Kevin Van Workum wrote:
> > I tried using:
> > NODECFG[amd24] CFGDISK=10000
> > 
> > checknode amd24 returned:
> > Configured Resources: PROCS: 1  MEM: 503M  SWAP: 1484M  DISK: 10000M
> > Utilized   Resources: DISK: 9999M
> > Dedicated  Resources: [NONE]
> > 
> > But my job doesn't start if I request more than 1mb, e.g. 'qsub -l
> > file=2mb'. So it looks like maui thinks there is only 1mb of disk
> > available.
> > 
> > Also, I couldn't find any 'file' option for mom's config, but I did
> > find the 'size' option. So I tried 'size[fs=/tmp]'.
> > 
> > In this case checknode amd24 returned:
> > Configured Resources: PROCS: 1  MEM: 503M  SWAP: 1483M  DISK: 12G
> > Utilized   Resources: DISK: 1855M
> > Dedicated  Resources: [NONE]
> > 
> > Which is close to the correct disk space for /tmp. 'df -h /tmp' gives:
> > [root at amd24 ~]# df /tmp -h
> > Filesystem                                    Size  Used Avail Use% Mounted on
> > /dev/mapper/VolGroup00-LogVol00 13G  1.9G 10G  16%    /
> > 
> > But I found that if I request file=4096mb or more, the job fails.
> > mom's log says:
> > "pbs_mom;Job;TMomFinalizeJob3;job not started, Failure job exec
> > failure, after files staged, no retry"
> > and pbs_server says:
> > "PBS_Server: stream_eof, connection to amd24 is bad, remote service
> > may be down, message may be corrupt, or connection may have been
> > dropped remotely (End of File).  setting node state to down"
> > 
> > Requesting less than file=4096mb works fine.
> > 
> > Maybe I'm requesting the required disk space incorrectly on qsub's command line.
> > 
> > Kevin
> > 
> > On 2/5/07, Dave Jackson <jacksond at clusterresources.com> wrote:
> > > Kevin,
> > >
> > >   I think you can indicate the amount of diskspace available using the
> > > TORQUE 'file' option in mom config.  Otherwise, you should be able to
> > > populate this info directly via Maui using 'NODECFG[X] CFGDISK=<VAL>'
> > > where val is specified in MB.  If this does not work, let me know and we
> > > will get it fixed.
> > >
> > > Dave
> > >
> > > On Mon, 2007-02-05 at 16:16 -0500, Kevin Van Workum wrote:
> > > > My nodes have various amounts of local scratch disk space. How do I
> > > > tell maui how much local scratch disk space each node has, and how do
> > > > I request a certain amount of disk space on each node for a particular
> > > > job?
> > > >
> > > > I'm testing with maui-3.2.6p19 and torque-2.1.6.
> > > >
> > >
> > >
> > 
> > 
> 
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers



More information about the mauiusers mailing list