[Mauiusers] Backfill and node reservation

Rich rercola at acm.jhu.edu
Mon Nov 15 08:18:18 MST 2010


You could have a periodic check in crontab or similar for "if node is
offline and list of jobs running on node is empty and [file exists in
some magic place], reboot", have the file be removed on startup, and
then send a job to create the file with the special high-priority
user, then offline the node.

It'll eventually reboot after being offlined and being out of jobs, so
you drop human intervention without having to do much else.

- Rich

On Mon, Nov 15, 2010 at 10:15 AM, Arnau Bria <arnaubria at pic.es> wrote:
> On Mon, 15 Nov 2010 09:03:22 -0600
> Charles Johnson wrote:
>
>> On Nov 15, 2010, at 8:47 AM, Arnau Bria wrote:
> Hi Charles,
>
>
>> > At some time we'd like to send a kind of job that reboots the host.
>> > But before rebooting the host we'd like to "drain" the node and
>> > don't lose any job while rebooting.
>>
>>
>> Why not just mark the node off-line, and when the jobs are finished
>> reboot the node?
>
> That's our current procedure.
>
> But, with the reboot scenario I previously described before, we could
> eliminate human intervention on reboot and checking node "drain".
>
> *I did not explain, but nodes went online/offline when rebooting
> automatically by job and local rc.local file.
> So it's interesting for us that a reboot (for kernel update, i.e) could
> be done by sending as many jobs as nodes we have.
>
>
>> ~Charles~
> Many thanks for your replies,
>
> Cheers,
> Arnau
> _______________________________________________
> mauiusers mailing list
> mauiusers at supercluster.org
> http://www.supercluster.org/mailman/listinfo/mauiusers
>


More information about the mauiusers mailing list