[torqueusers] [Mauiusers] Moving jobs from one node to another

Gus Correa gus at ldeo.columbia.edu
Mon Aug 13 12:48:13 MDT 2012


On 08/13/2012 02:17 PM, Denis wrote:
> 2012/8/13 Fernando Caba<fcaba at uns.edu.ar>:
>> Hy, i want to know something about moving jobs from one node to another.
>> If i need to do some manteinance in one node with a certain number of
>> running jobs (they cannot be killed).
>> Can i move those all jobs (or specific) to another node (free or not)? If
>> yes, how?
>>
>> Sorry because I´m asking again the same, is it a dumb question?
> Hello, Fernando.
>
> You cannot move a running job to another node. That would be possible
> with Condor if you link your code against its libraries when
> compiling.
>
> D.
Hi Fernando

The best thing is to use algorithms and programs that can be restarted 
from a given state/configuration,
and run them for a relatively small time [hours, not days, or weeks, or 
months], restarting as needed.
Not all programs are written this way, but often times they have this 
capability, and users simply don't know about it
or how to use it.

This way, if the user loses one job, [s]he doesn't loose too much, and 
can restart from the state/configuration
saved by the previous job in the sequence.
Also, you won't feel too guilty for killing a job that has been running 
for a few hours only,
but your user may become very upset if you kill  her/his job that has 
been running for three weeks.

Our queues here have a maximum walltime of 12h, but 6h is common
in many public computers.
A modest job runtime also improves the overall throughput of the cluster,
and prevents hogging of the cluster nodes by one or a few users.

I hope this helps,
Gus Correa

>> Regards
>>
>> Fernando
>>
>> --
>> ----------------------------------------------------
>> Ing. Fernando Caba
>> Director General de Telecomunicaciones
>> Universidad Nacional del Sur
>> http://www.dgt.uns.edu.ar
>> Tel/Fax: (54)-291-4595166
>> Tel: (54)-291-4595101 int. 2050
>> Avda. Alem 1253, (B8000CPB) Bahía Blanca - Argentina
>> ----------------------------------------------------
>>
>>
>>
>> _______________________________________________
>> mauiusers mailing list
>> mauiusers at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/mauiusers
>>
>
>



More information about the torqueusers mailing list