[torquedev] Auto delete job when another finishes

"Mgr. Šimon Tóth" SimonT at mail.muni.cz
Tue Sep 21 16:57:59 MDT 2010


On 22.9.2010 00:23, Gareth.Williams at csiro.au wrote:
>> -----Original Message-----
>> From: torquedev-bounces at supercluster.org [mailto:torquedev-
>> bounces at supercluster.org] On Behalf Of "Mgr. Šimon Tóth"
>> Sent: Wednesday, 22 September 2010 12:22 AM
>> To: Torque Dev. Mailing List
>> Subject: [torquedev] Auto delete job when another finishes
>>
>> I have been facing a dilemma how to best approach this problem.
>>
>> I have two related jobs, one is an outer jobs (builds a virtual machine)
>> one is an inner job (runs in the virtual machine). I want to delete
>> outer job as soon as the inner is completed. I was thinking about adding
>> a special job dependency for this.
>>
>> What do you think?
>>
>> --
>> Mgr. Šimon Tóth
> 
> Interesting thought.
> 
> Why can't you just (find and) delete the outer job as the last command in the inner job?
> 
> Also, why have separate jobs at all?  Why not just build the VM and start work in it in the same job?
> 
> Off-hand I don't think that adding a feature to allow a special dependency for this is a good idea.

I should probably explain this further. We have a nontrivial piece of
code handling the creation, destruction and maintenance of virtual
clusters in Torque.

Now the problem is that some jobs require custom images, but they are
not interested in building clusters or VPN, they just want to run the
job inside a specific image.

We automatically generate the job that will actually build the machine,
but the one run inside the built machine is provided by the user.

The external job (handling the virtual cluster) cannot end before the
internal job is fully finished (this includes epilogue).

Having only one job would require huge code changes. The problem is that
the creation of a virtual machine can take a lot of time (fetch the
image over NFS, write to disk, boot the machine), there are custom hacks
that speed up the confirmation of success on both server and node to
prevent server hanging. I can't even think about a solution for
interactive jobs.

-- 
Mgr. Šimon Tóth

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3366 bytes
Desc: S/MIME Cryptographic Signature
Url : http://www.supercluster.org/pipermail/torquedev/attachments/20100922/0f830346/attachment.bin 


More information about the torquedev mailing list