[torquedev] New TORQUE job state

Wesley Emeneker Wesley.Emeneker at asu.edu
Wed Jul 11 12:48:48 MDT 2007

Hash: SHA256

  I am interning at CRI this summer, and I'm working on integrating
virtual machine (VM) deployment into Moab.
I would like to ask about adding some functionality to TORQUE, but I
need to explain what I'm doing.
Here goes...

The plan of what I'm doing is to create and destroy VMs dynamically so
that we can switch out cluster software environments, on demand, for
different jobs.
We are currently able to dynamically create nodes in TORQUE and assign
jobs to them without TORQUE knowing that the node is actually a virtual
Each virtual machine runs a MOM of its own and coordinates with the
TORQUE server.
When a job wants to run inside a VM, Moab provisions the nodes (aka
boots the VMs), and then gives PBS the VM nodes as the nodelist.
The job then executes inside the VMs, and when the job is done Moab
destroys the VMs.

The problem I am facing occurs when I try to preserve the VMs.
One of the great features of many VMs is that we can save the state of
the entire VM to disk and restore it later.
This will let us do transparent checkpointing and preemption/restoration
of any job we desire (what I call preservation).

I'm able to make Moab preserve and restore the job (aka VM), but a
problem arises because PBS sees the node as job-exclusive even if it is
down (which it is because the entire VM was saved to disk).
Because PBS sees the job as active, Moab gets confused and puts the job
into the Running state (instead of the Idle state that the Preservation
Dave and Josh suggested that a new TORQUE job state would be the best
way to handle this since we must have some kind of coordination between
Moab job state and PBS job state.
What I would like is some way to say that the job is "frozen" or
"preserved" that basically corresponds to some kind of state other than
running (Queued maybe?).
We should also be able to "disassociate" a frozen job from a node so
that the node isn't job-exclusive once the job is frozen.

Hopefully my explanation is clear (probably not).
Let me know if you have any questions about what I'm doing.
I look forward to hearing if this functionality will be possible.


Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org


More information about the torquedev mailing list