[Mauiusers] Backfill and node reservation

Bogdan Costescu bcostescu at gmail.com
Mon Nov 15 10:18:32 MST 2010

On Mon, Nov 15, 2010 at 15:47, Arnau Bria <arnaubria at pic.es> wrote:
> At some time we'd like to send a kind of job that reboots the host.

I wonder why people keep thinking that a job should reboot the node -
this would involve making the job run as a priviledged user or
relaxing security measures. How about master node tells the node when
to reboot in a way that is only available to root ?

And why do you think that this can only be achieved through Maui and
not at a lower level ? And by the way what is the lower level: Torque,
SGE, SLURM, something else ?

> But before rebooting the host we'd like to "drain" the node and don't
> lose any job while rebooting.

OK, but what does this have to do with backfilling ? What kind of jobs
run on the node ? Are nodes allocated exclusively (only one job can
run on a node at a time) or can a job get parts of a node (and then
several jobs could run simultaneously on a node) ?

> 2.-) Send a job that requests all node cpus using "special one" user
> account (npp=$node_cpu

If npp is actually ppn, then maybe you are running Torque or some PBS
derivative. In this case, this suggests that nodes are not exclusive
and that several jobs can exist on the node at the same time. By
asking for all of them, you want to make sure that there is nothing
else running on the node at that time.

> 3.-) Dissable backfill so no short jobs will run while nodes are
> drained.

Why ? What happens on the node to make it unusable for the backfilled
short jobs that could run in the meantime ?

> But this scenario is not working.  Seems that backfill is not dissabled
> cause top queue jobs "are not blocking" low prio jobs.

I don't understand why you are fixated on backfill.... Maybe answering
the above question will clear this up.

My suggestion is to use the lower level:
1. set the node offline - no new jobs could start on it, independent
of whether Maui would believe there are short jobs that could be
2. poll asking for the state of the node (output of qstat) until the
list of the jobs running on the node becomes empty
3. reboot the node; this can be done in a number of ways depending on
what infrastructure you have in place, f.e. 'ssh nodeXXX /sbin/reboot'
or 'ipmitool -I lan -H nodeXXX power [soft|reset]' or using func or
using cfengine to create a file which would signify to a locally
running cron job to run /sbin/reboot or...
4. wait a bit or find some way to decide whether the node has begun
the shutdown or maybe that the node has started up again; then remove
the offline state of the node

This can be automated in several ways, my solution was to use the very
nice pbs_python. Unfortunately I can't publish it, I have changed
workplaces in the meantime.


More information about the mauiusers mailing list