> >It is actually difficult to do while avoiding possible race conditions.
> >First, you need to drain the nodes by marking them offline.  Then you need to
> >mark them for reboot using the node note.  Then a script can reboot nodes when
> >it finds them offline, without a job, and marked for reboot.
> Thanks Garrick! How about rebooting at least those nodes that find
> themselves without a job.

Then you run into a race condition.  Perhaps the scheduler is about to run a
job on that node?

> Is there a provision so that I can tell pbs to exec a script when it
> finds itself job-free (might work better on my older nodes with only 2
> cores / node)
> I could have this (shell) script to then check when was the last time
> it was rebooted and if too long ago then reboot. What do you think of
> this idea.

You can't do it from pbs_mom alone or you will run into race problems.

> Idea 2: I'd have to submit dummy jobs with a cron from the master node
> that are designed to run on specific nodes. But then again torque will
> not allow a job to execute a reboot command will it? Maybe if
> submitted as a root user?

Torque doesn't know or care what commands you run.  But rebooting nodes during
your active job is asking for trouble.

