[Mauiusers] [slurm-dev] Issues Running SLURM Testsuite with Maui

Rafael Folco rfolco at linux.vnet.ibm.com
Tue Jun 2 09:07:22 MDT 2009


hmm, actually I did. 

cset max_job_delay 600


Even changing the timeout, srun and salloc don't respond and many tests
fail:

FAILURE: srun not responding
FAILURE: salloc not responding


Is there any other config I may try on SLURM and/or Maui sides ? 

Increasing the timeout to 20 minutes will help ?

Thanks,

Rafael


On Tue, 2009-06-02 at 07:42 -0700, jette1 at llnl.gov wrote:
> Maui and Moab dramatically slow down SLURM job scheduling.
> You'll need to set a much higher timeout in SLURM's
> testsuite for it to run successfully with Maui or Moab.
> Do this by adding a file called "globals.local" in the
> testsuite directory and add a line like this:
> 
> set max_job_delay 600
> 
> or edit the value in the "globals" file if you prefer.
> The default value is 120 seconds to run a small job,
> which isn't sufficient in your configuration. 10 minutes
> to launch a job should be sufficient, but the test
> suite will take a very long time to complete.
> 
> 
> At 11:08 AM -0300 6/2/09, Rafael Folco wrote:
> >Hi,
> >
> >First of all, sorry for sending it to crossed mailing lists.
> >
> >I am running SLURM testsuite with Maui configured, but I see many
> >FAILURES and srun doesn't respond for most options. Is it expected or
> >should I fix something to run SLURM testsuite smoothly ?
> >
> >Here is one example (I see many other failures like this):
> >
> >TEST: 1.23
> >spawn /usr/bin/srun -N1 -l --mincpus=999999 -t1 hostname
> >srun: Job is in held state, pending scheduler release
> >srun: job 38 queued and waiting for resources
> >
> >FAILURE: srun not responding
> >
> >When removing option --mincpus, "srun -N1 -l -t1 hostname" works fine.
> >
> ># cat slurm-maui-sles11.log |grep SUCCESS| wc -l
> >137
> >
> ># cat slurm-maui-sles11.log |grep FAIL| wc -l
> >102
> >
> >
> >I couldn't finish the testsuite, it was running for more than 10 hours
> >and just getting errors...
> >
> >FAILURE: srun not responding
> >FAILURE: salloc not responding
> >
> >
> >In spite of Maui/SLURM seem to be working, I see this error on maui.log:
> >
> >06/02 08:07:36 MRMCheckEvents()
> >06/02 08:07:36 ALERT:    cannot query events on RM (RM 'cluster-ib-5'
> >does not support function 'rmeventquery')
> >06/02 08:07:36 MSUAcceptClient(5,ClientSD,HostName,TCP)
> >06/02 08:07:36 INFO:     accept call failed, errno: 11 (Resource
> >temporarily unavailable)
> >06/02 08:07:36 INFO:     all clients connected.  servicing requests
> >
> >
> ># showq
> >ACTIVE JOBS--------------------
> >JOBNAME            USERNAME      STATE  PROC   REMAINING
> >STARTTIME
> >
> >
> >      0 Active Jobs       0 of    4 Processors Active (0.00%)
> >
> >
> >I appreciate if somebody point me to the root of the problem and clarify
> >what is going on.
> >
> >Thanks in advance.
> >
> >Rafael
> >
> >
> >--
> >Rafael Folco
> >Linux on Power
> >IBM Linux Technology Center
> >E-Mail: rfolco at linux.vnet.ibm.com
> >
> >Attachment converted: Macintosh HD:slurm-maui-sles11.log (TEXT/ttxt) 
> >(012EBAB2)
> 
> 

-- 
Rafael Folco
Linux on Power
IBM Linux Technology Center
E-Mail: rfolco at linux.vnet.ibm.com



More information about the mauiusers mailing list