[Mauiusers] Jobs in Queue Forever

Lennart Karlsson Lennart.Karlsson at nsc.liu.se
Fri Nov 12 04:18:52 MST 2004

The 4th of November, Gabe Turner wrote:
> Also, if you don't want to bother looking at when the job was asking for,
> you can remove neednodes entirely by passing it no value:
> qalter -l neednodes= 503
> Unfortunately, I have this problem in PBSPro 5.4.1 and have always had this
> problem.  It does make sense to leave neednodes set when a node goes down,
> however, since it will ensure that those jobs get run at that node as soon
> as it comes back.  However, this assumes that the node will come back soon,
> i.e. that it wasn't a hardware failure that brought it down.  Unfortunately
> for me, I'm almost never in the situation that I can bring the node back up
> promptly so I have to manually go through all the jobs that were on the
> node and unset neednodes :\

I have the same problem on our PBSPro cluster, and I wrote a perl script, run
by cron every 20 minutes, that does the 'qalter'. (And I did configure
Maui to defer jobs for half an hour.) My script does also a few other checks
and actions, matching our local policies and environment, making it unfit to
use on other clusters, but the central check is to compare
"Resource_List.neednodes" with "Resource_List.nodes" in the "qstat -f" output
for all jobs in the "job_state" called "Q".

Stupid solutions to stupid problems... ;-)

-- Lennart Karlsson <Lennart.Karlsson at nsc.liu.se>
   System Expert at National Supercomputer Centre in Linkoping, Sweden
   +46 706 49 55 35
   +46 13 28 26 24

