[Mauiusers] SLURM Reservaion not revoked when job is finished

Robert LeBlanc leblanc at byu.edu
Fri Feb 11 11:14:14 MST 2005


Ok, I upped the log level to level 8 to be safe and this is what I found
after I terminated a job:

02/11 11:00:59 INFO:     received job list through WIKI RM
02/11 11:00:59 INFO:     loading 1 job(s)
02/11 11:00:59 MWikiGetAttr(job,Name,Status,Attr,Start)
02/11 11:00:59 WARNING:  job '2' detected with unexpected state '20'
02/11 11:00:59 INFO:     1 WIKI jobs detected on RM base
02/11 11:00:59 INFO:     jobs detected: 1
02/11 11:00:59 MStatClearUsage(node,Active)

I'm not sure if SLURM is sending a non-compliant 'completed' code or if
Maui just doesn't know what state 20 is. This message is repeated about
every 5 seconds (poll interval). The only time MResDestroy() is called
is when I terminate Maui. I think it is because it doesn't know what to
do with the '20'.


On Thu, 2005-02-10 at 09:00 -0700, Wightman wrote:
> I'm not exactly sure how things work with SLURM but with pbs job
> completion can sometimes be a tricky thing.  If Maui loses communication
> with pbs and a job completes during that time, sometimes maui cannot be
> sure exactly what happened to the job and if it will come back.
> 
> When maui does know a job has completed it will call the MJobDestroy()
> routine.  One of the first things that happens inside that routine is
> this (in MJob.c):
> 
> 
>   /* release job reservations */
> 
>   if (J->R != NULL)
>     {
>     MResDestroy(&J->R);
>     }
> 
> 
> which destroys the reservation.  If I were you I would either gdb maui
> to figure out what happens at this point in the code on your system, or
> look through the log files (at least a log level of 7) to see if
> MJobDestroy() is ever called.
> 
> - Douglas
> Cluster Resources, INC.
> 
> 
> 
> 
> 
> On Wed, 2005-02-09 at 10:38 -0700, Robert LeBlanc wrote:
> > Maui experts;
> > 	I have a cluster that I am running SLURM and MAUI set-up with WIKI. I
> > find that when I submit a job to SLURM, resources are allocated and the
> > job runs. I've noticed that a reservation is created for the job. Now if
> > the job completes before the reservation is over, then the nodes are not
> > scheduled again until the reservation is over. SLURM correctly shows
> > that the nodes are open when the job is complete. Another thing that I
> > have found is that nodes (using diagnose -n) will not show idle even
> > after the reservation is cleared. I think that there may be a config
> > that I am missing to get the node information correctly. Maui is able to
> > tell when a node is down, I'm not sure why it can't always tell when the
> > job is over and the nodes are idle. When the reservation is over, it
> > correctly shows that there are 2:2 procs available, but it says that it
> > is running. I've looked through the docs and can't find the
> > option/solution to get this working correctly. Susanne Belle has been
> > very helpful to this point and I appreciate it.
> > 
> 
> 



More information about the mauiusers mailing list