[torquedev] New TORQUE job state

Wesley Emeneker Wesley.Emeneker at asu.edu
Wed Jul 11 12:48:48 MDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Garrick,
  I am interning at CRI this summer, and I'm working on integrating
virtual machine (VM) deployment into Moab.
I would like to ask about adding some functionality to TORQUE, but I
need to explain what I'm doing.
Here goes...

The plan of what I'm doing is to create and destroy VMs dynamically so
that we can switch out cluster software environments, on demand, for
different jobs.
We are currently able to dynamically create nodes in TORQUE and assign
jobs to them without TORQUE knowing that the node is actually a virtual
machine.
Each virtual machine runs a MOM of its own and coordinates with the
TORQUE server.
When a job wants to run inside a VM, Moab provisions the nodes (aka
boots the VMs), and then gives PBS the VM nodes as the nodelist.
The job then executes inside the VMs, and when the job is done Moab
destroys the VMs.

The problem I am facing occurs when I try to preserve the VMs.
One of the great features of many VMs is that we can save the state of
the entire VM to disk and restore it later.
This will let us do transparent checkpointing and preemption/restoration
of any job we desire (what I call preservation).

I'm able to make Moab preserve and restore the job (aka VM), but a
problem arises because PBS sees the node as job-exclusive even if it is
down (which it is because the entire VM was saved to disk).
Because PBS sees the job as active, Moab gets confused and puts the job
into the Running state (instead of the Idle state that the Preservation
set).
Dave and Josh suggested that a new TORQUE job state would be the best
way to handle this since we must have some kind of coordination between
Moab job state and PBS job state.
What I would like is some way to say that the job is "frozen" or
"preserved" that basically corresponds to some kind of state other than
running (Queued maybe?).
We should also be able to "disassociate" a frozen job from a node so
that the node isn't job-exclusive once the job is frozen.

Hopefully my explanation is clear (probably not).
Let me know if you have any questions about what I'm doing.
I look forward to hearing if this functionality will be possible.

Thanks,
   Wesley







-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iQEVAwUBRpUmDx3KlQrAVWnHAQhMTggAnKRO0QlPWV9WUBoy9oXb5WBnUbyDtb8c
KXTIMvKbJz4gTUpbvZN7dy4yW9z7nJ1XqbU958svS7TxnUFClSbyhoGkpbvISAIz
pXdNfXO/ppqjorhvVRY/w9T2EwKBJ2BJuOReGoEts8MmTwenLI5ooYWNGi/7fjGA
YFFHK8yycFhnfutQKO0OnWn91pcksqCGfevx7Oa+UhMvSles6bsJ99QqW8hLXM4u
xlujsfmlRCpfbvPh+Lky15HlsJsRtPIiNT7uodJxlI6terhINx7GqhZiQYlgVEpU
dhMbQQU9mgByYcORMxMyfbQIGm6sv7hXY3tC2zT6n4S7FN9A/C43yg==
=2ASl
-----END PGP SIGNATURE-----


More information about the torquedev mailing list