[torquedev] trunk: job arrays
Caird, Andrew J
acaird at umich.edu
Fri Aug 10 12:28:44 MDT 2007
> I think I will probably setup a server_priv/arrays directory where
> there would be a file for each array. The data in these files would
> allow me to rebuild the server list of arrays, and I could also track
> how many of the array's jobs have been sucessfullys spawned. Since
> the array job "cloning" is done in batches through pbs_server work
> tasks, it would be possible for the server to get shutdown after the
> array has been partially built. Upon restart pbs_server does not
> resume the job cloning process. If after every sucessful job clone we
> can update this array file (this would have to be pretty fast), then
> it would be possible to resume the job cloning process after a server
> I would love to hear suggestions!
Would it make sense to have some simple transactional thing here? I
think this is an edge case, really. But if you wanted, the order could
be "write array file with a pre-run flag", "do the cloning", "update
If the server restarts after the writing and during the cloning, you'd
have to check for things that exist and have the pre-run flag from the
file, then update their flag. Create those that don't exist, and then
finish updating the file to mark the jobs as "not-pre-run". This
pre-supposes the ability of the server to understand cloned jobs, etc.
If the server dies during the initial writing of the file, on re-start
it could ignore the partial file and start over, since nothing will have
I'm not sure this makes a lot of sense, but at the very least this is a
suggestion (good or bad).
We're looking forward to the functionality, thanks for taking such care
with it, we appreciate it.
More information about the torquedev