[Mauiusers] Backfill and node reservation
rercola at acm.jhu.edu
Mon Nov 15 08:18:18 MST 2010
You could have a periodic check in crontab or similar for "if node is
offline and list of jobs running on node is empty and [file exists in
some magic place], reboot", have the file be removed on startup, and
then send a job to create the file with the special high-priority
user, then offline the node.
It'll eventually reboot after being offlined and being out of jobs, so
you drop human intervention without having to do much else.
On Mon, Nov 15, 2010 at 10:15 AM, Arnau Bria <arnaubria at pic.es> wrote:
> On Mon, 15 Nov 2010 09:03:22 -0600
> Charles Johnson wrote:
>> On Nov 15, 2010, at 8:47 AM, Arnau Bria wrote:
> Hi Charles,
>> > At some time we'd like to send a kind of job that reboots the host.
>> > But before rebooting the host we'd like to "drain" the node and
>> > don't lose any job while rebooting.
>> Why not just mark the node off-line, and when the jobs are finished
>> reboot the node?
> That's our current procedure.
> But, with the reboot scenario I previously described before, we could
> eliminate human intervention on reboot and checking node "drain".
> *I did not explain, but nodes went online/offline when rebooting
> automatically by job and local rc.local file.
> So it's interesting for us that a reboot (for kernel update, i.e) could
> be done by sending as many jobs as nodes we have.
> Many thanks for your replies,
> mauiusers mailing list
> mauiusers at supercluster.org
More information about the mauiusers