[Mauiusers] SLURM Reservaion not revoked when job is finished

Wightman wightman at clusterresources.com
Fri Feb 11 12:13:54 MST 2005


The accepted states for jobs (in wiki format) are described here:

http://www.clusterresources.com/products/maui/docs/wiki/wikiinterface.shtml

You would have to track the communication from SLURM to determine if it
is adhering to these standards.

If SLURM is indeed submitting workload data according to the accepted
format then the bug lies in maui.

Hope that helps,

- Douglas
Cluster Resources, INC.


On Fri, 2005-02-11 at 11:14 -0700, Robert LeBlanc wrote:
> Ok, I upped the log level to level 8 to be safe and this is what I found
> after I terminated a job:
> 
> 02/11 11:00:59 INFO:     received job list through WIKI RM
> 02/11 11:00:59 INFO:     loading 1 job(s)
> 02/11 11:00:59 MWikiGetAttr(job,Name,Status,Attr,Start)
> 02/11 11:00:59 WARNING:  job '2' detected with unexpected state '20'
> 02/11 11:00:59 INFO:     1 WIKI jobs detected on RM base
> 02/11 11:00:59 INFO:     jobs detected: 1
> 02/11 11:00:59 MStatClearUsage(node,Active)
> 
> I'm not sure if SLURM is sending a non-compliant 'completed' code or if
> Maui just doesn't know what state 20 is. This message is repeated about
> every 5 seconds (poll interval). The only time MResDestroy() is called
> is when I terminate Maui. I think it is because it doesn't know what to
> do with the '20'.
> 
> 
> On Thu, 2005-02-10 at 09:00 -0700, Wightman wrote:
> > I'm not exactly sure how things work with SLURM but with pbs job
> > completion can sometimes be a tricky thing.  If Maui loses communication
> > with pbs and a job completes during that time, sometimes maui cannot be
> > sure exactly what happened to the job and if it will come back.
> > 
> > When maui does know a job has completed it will call the MJobDestroy()
> > routine.  One of the first things that happens inside that routine is
> > this (in MJob.c):
> > 
> > 
> >   /* release job reservations */
> > 
> >   if (J->R != NULL)
> >     {
> >     MResDestroy(&J->R);
> >     }
> > 
> > 
> > which destroys the reservation.  If I were you I would either gdb maui
> > to figure out what happens at this point in the code on your system, or
> > look through the log files (at least a log level of 7) to see if
> > MJobDestroy() is ever called.
> > 
> > - Douglas
> > Cluster Resources, INC.
> > 
> > 
> > 
> > 
> > 
> > On Wed, 2005-02-09 at 10:38 -0700, Robert LeBlanc wrote:
> > > Maui experts;
> > > 	I have a cluster that I am running SLURM and MAUI set-up with WIKI. I
> > > find that when I submit a job to SLURM, resources are allocated and the
> > > job runs. I've noticed that a reservation is created for the job. Now if
> > > the job completes before the reservation is over, then the nodes are not
> > > scheduled again until the reservation is over. SLURM correctly shows
> > > that the nodes are open when the job is complete. Another thing that I
> > > have found is that nodes (using diagnose -n) will not show idle even
> > > after the reservation is cleared. I think that there may be a config
> > > that I am missing to get the node information correctly. Maui is able to
> > > tell when a node is down, I'm not sure why it can't always tell when the
> > > job is over and the nodes are idle. When the reservation is over, it
> > > correctly shows that there are 2:2 procs available, but it says that it
> > > is running. I've looked through the docs and can't find the
> > > option/solution to get this working correctly. Susanne Belle has been
> > > very helpful to this point and I appreciate it.
> > > 
> > 
> > 
> 



More information about the mauiusers mailing list