[torquedev] trunk: job arrays

Caird, Andrew J acaird at umich.edu
Fri Aug 10 12:28:44 MDT 2007

> I think I will probably setup a server_priv/arrays directory where
> there would be a file for each array.  The data in these files would
> allow me to rebuild the server list of arrays,  and I could also track
> how many of the array's jobs have been sucessfullys spawned.   Since
> the array job "cloning" is done in batches through pbs_server work
> tasks, it would be possible for the server to get shutdown after the
> array has been partially built.  Upon restart pbs_server does not
> resume the job cloning process.  If after every sucessful job clone we
> can update this array file (this would have to be pretty fast), then
> it would be possible to resume the job cloning process after a server
> restart.
> I would love to hear suggestions!


Would it make sense to have some simple transactional thing here?  I
think this is an edge case, really.  But if you wanted, the order could
be "write array file with a pre-run flag", "do the cloning", "update
array file".

If the server restarts after the writing and during the cloning, you'd
have to check for things that exist and have the pre-run flag from the
file, then update their flag.  Create those that don't exist, and then
finish updating the file to mark the jobs as "not-pre-run".  This
pre-supposes the ability of the server to understand cloned jobs, etc.

If the server dies during the initial writing of the file, on re-start
it could ignore the partial file and start over, since nothing will have
been launched.

I'm not sure this makes a lot of sense, but at the very least this is a
suggestion (good or bad).

We're looking forward to the functionality, thanks for taking such care
with it, we appreciate it.


More information about the torquedev mailing list