[Mauiusers] Backfill and node reservation

Arnau Bria arnaubria at pic.es
Mon Nov 15 10:49:34 MST 2010


On Mon, 15 Nov 2010 18:18:32 +0100
Bogdan Costescu wrote:

Hi Bogdan,

> On Mon, Nov 15, 2010 at 15:47, Arnau Bria <arnaubria at pic.es> wrote:
> > At some time we'd like to send a kind of job that reboots the host.
> 
> I wonder why people keep thinking that a job should reboot the node -
> this would involve making the job run as a priviledged user or
> relaxing security measures. How about master node tells the node when
> to reboot in a way that is only available to root ?

I'm also asking how other people is doing what I'm trying. So any
explanation is welcome!
 
> And why do you think that this can only be achieved through Maui and
> not at a lower level ? And by the way what is the lower level: Torque,
> SGE, SLURM, something else ?

Cause I use torque/MAUI and, IMO, maui is not respecting my backfill
configuration. So, I'm trying to guess if it's a software problem
(maui), a concept problem (me), or mystical problem.
 

> > But before rebooting the host we'd like to "drain" the node and
> > don't lose any job while rebooting.
> 
> OK, but what does this have to do with backfilling ? 

My idea is: sending a high prio job that request all node cpus will
prevent other jobs to run in that node except if backfill is enabled.
It's more or less a FIFO queue, except that prios are calcuted by FS.


> What kind of jobs run on the node ? Are nodes allocated exclusively
> (only one job can run on a node at a time) or can a job get parts of
> a node (and then several jobs could run simultaneously on a node) ?

Several jobs can run in a node. We have #jobs=#cpus

 
> > 2.-) Send a job that requests all node cpus using "special one" user
> > account (npp=$node_cpu
> 
> If npp is actually ppn, then maybe you are running Torque or some PBS
> derivative. In this case, this suggests that nodes are not exclusive
> and that several jobs can exist on the node at the same time. By
> asking for all of them, you want to make sure that there is nothing
> else running on the node at that time.

Exactly.
 
> > 3.-) Dissable backfill so no short jobs will run while nodes are
> > drained.
> 
> Why ? What happens on the node to make it unusable for the backfilled
> short jobs that could run in the meantime ?

A queue? I mean, if my job is first one, why second one starts before
mine? So, only some kind of order is taken in consideration?
 
* Maybe I'm wrong, but in few words, backfill allows to use resources in
  the meantime.

http://www.clusterresources.com/products/maui/docs/8.2backfill.shtml

Backfill is a scheduling optimization which allows a scheduler to make
better use of available resources by running jobs out of order.
[...]
It starts the jobs one by one stepping through the priority list until
it reaches a job which it cannot start.
[...]
 
> > But this scenario is not working.  Seems that backfill is not
> > dissabled cause top queue jobs "are not blocking" low prio jobs.
> 
> I don't understand why you are fixated on backfill.... Maybe answering
> the above question will clear this up.

Hope so, cause it's clear in my mind :-) Now I'd like to understand
if it's a software problem or conceptual one.


> My suggestion is to use the lower level:
> 1. set the node offline - no new jobs could start on it, independent
> of whether Maui would believe there are short jobs that could be
> backfilled
> 2. poll asking for the state of the node (output of qstat) until the
> list of the jobs running on the node becomes empty

This is the point where my solution differs on yours (or Rich one wich
already suggested "something" similar).
If my jobs starts means that node is empty, so the reboot is safe. it
goes offline and late online when running rc.local (if node restarts
fine).

I don't know why my idea is so bad :-)

> 3. reboot the node; this can be done in a number of ways depending on
> what infrastructure you have in place, f.e. 'ssh nodeXXX /sbin/reboot'
> or 'ipmitool -I lan -H nodeXXX power [soft|reset]' or using func or
> using cfengine to create a file which would signify to a locally
> running cron job to run /sbin/reboot or...

Is it simplest than my idea? 
We could use puppet or whatever to do so, but adding a simple line to
sudo that allow special user to reboot and a rc.local script are only
conf needed.

> 4. wait a bit or find some way to decide whether the node has begun
> the shutdown or maybe that the node has started up again; then remove
> the offline state of the node
my rc.local will do it.

 
> This can be automated in several ways, my solution was to use the very
> nice pbs_python. Unfortunately I can't publish it, I have changed
> workplaces in the meantime.

yep, that's a pity. 

I really appreciate your reply and the explanation on how you're
solving the issue.
 
> Cheers,
> Bogdan
Many thanks for your reply,
Cheers,
Arnau


More information about the mauiusers mailing list