[torqueusers] Question about prologue scripts and sending jobs back to queue.

John Hanks griznog at gmail.com
Tue Jan 22 23:10:47 MST 2008


I'm trying to write a health check script for my nodes and I want it
to reject a job, set the nod offline and have the job immediately be
eligible to run again somewhere else when the health check fails. To
get a grip on what happens when the epilogue script exits at different
values, I have in my eliplogue script (a perl script)

exit (4);

I have also specified a fake resource associated with my test nodes.
So I submit my job

qsub -l feature=fakeresource testjob.sh

And it gets rejected correctly and sent back to the queue, but is now
deferred. When I release the hold on it, it runs again but this time
ignores my fakeresource request and runs on the next available node.
How do I get this to return a job to the queue without a hold on it
and get my resource request to stick?

Actually, any pointers to some sample prologue/epilogue scripts or
more information about how they work would be appreciated.



John Hanks
Utah State University

