[Mauiusers] suspend / resume

Wightman wightman at supercluster.org
Tue Aug 24 11:57:41 MDT 2004


You may be able to use runjob instead of qrun.  If you run runjob -c it
will clear any 'stale' attributes from the job (including hostlist). 
This will allow it to be run on other nodes as well (see the
documentation for runjob).

Moab is able to pick up an altered hostlist from an administrator qrun
command.  We are looking into what it would take to port that to maui.


Douglas
Cluster Resources, INC.

On Tue, 2004-08-03 at 02:48, Bernward Platz wrote:
> The problem of using qsig ist, that qsig does not delete the reservation made 
> by maui. If you use a PREEMPTEE/PREEMPTOR-configuration in mau, maui suspends 
> the job and it should
> work. Or use 
> 
> mjobctl -s <jobid>
> 
> to suspend the job manually. mjobctl deletes the reservation of the suspended 
> job, so maui can start the new job.
> 
> For better understanding here the output of a job, suspended by qsig
> (checkjob <jobid>):
> ----------------------------------------
> checking job 805
> 
> State: Suspended  (User: platz  Group: users)
> WallTime: 0:00:30 of 99:23:59:59
> Suspended Wall Time: 0:00:20
> SubmitTime: Tue Aug  3 08:07:28
>   (Time Queued  Total: 0:00:53  Eligible: 0:00:01)
> 
> Total Tasks: 2
> 
> Req[0]  TaskCount: 2  Partition: DEFAULT
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Class: [tp_pri_A 1]  Features: [NONE]
> 
> 
> IWD: [NONE]  Executable:  [NONE]
> QOS: hi  Bypass: 0  StartCount: 1
> PartitionMask: [ALL]
> Flags:       RESTARTABLE PREEMPTOR
> 
> EState 'Running' does not match current state 'Suspended'
> !!!!!!!!!!!!!!!!  Reservation '805' (-0:00:52 -> 99:23:59:07  Duration: 
> 99:23:59:59) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
> PE:  2.00  StartPriority:  1000
> cannot select job 805 for partition DEFAULT (non-idle expected state 
> 'Running')
> ---------------------------
> 
> 
> 
> 
> And this is the output of a job suspended by maui:
> 
> checking job 803
> 
> State: Suspended
> Creds:  user:platz  group:users  class:tp_pri_A  qos:hi
> WallTime: 00:00:31 of 99:23:59:59
> Suspended Wall Time: 00:00:10
> SubmitTime: Tue Aug  3 10:43:04
>   (Time Queued  Total: 00:01:13  Eligible: 00:00:00)
> 
> StartDate: 00:00:49  Tue Aug  3 10:45:06
> Total Tasks: 2
> 
> Req[0]  TaskCount: 2  Partition: DEFAULT
> Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> Opsys: [NONE]  Arch: [NONE]  Features: [linux]
> Allocated Nodes:
> [node02:1][gordon:1]
> 
> 
> IWD: [NONE]  Executable:  [NONE]
> Bypass: 0  StartCount: 0
> PartitionMask: [ALL]
> Flags:       RESTARTABLE PREEMPTOR
> 
> PE:  2.00  StartPriority:  1001
> cannot select job 810 for partition DEFAULT (startdate in '00:00:49')
> 
> 
> 
> 
> 
> On Tuesday 03 August 2004 02:17, you wrote:
> > Thanks for replying Bernward.
> >
> > We still haven't gotten maui to automatically suspend and resume jobs.
> > How we do it with our cluster is we just manually suspend jobs (qsig -s
> > suspend <jobid>) and experiment if the next job in the queue will really
> > be run.
> >
> > Using your patch, I tried to submit a parallel job (which uses 6
> > processors) on our 6-node/processor cluster which I suspended after the
> > job runs for a minute. I then submit another job which requests for 6
> > processors as well. The new job will just stay in the queue, will go to
> > a defered state, and later on be put to an idle state. "pbsnodes -a"
> > tells me that the nodes state are already free but checkjob <jobid>
> > tells me that the job has been put on an idle state because there are no
> > idle processors which can satisfy the job requirements. Although the job
> > doesn't get executed automatically, I still can force it to run by using
> > the qrun command on that particular job.
> >
> > Have you applied any other patch (for the suspend and resume to work) in
> > addition to the one that you sent to the mailing list? At the moment, I
> > am using two patches: the torque and mpiexec patch which properly
> > suspends parallel jobs and the one that you sent which frees up the
> > nodes with the suspended jobs. I got the first patch from Sebastien.
> >
> > Can you also tell me the architecture and OS of the machine you are
> > running your jobs on? We are trying to test maui on an alpha-linux
> > cluster and we've already tried everything that we can think of to get
> > suspend and resume to work but we are still unsuccessful with it. We are
> > thinking that it might be that we are running our job on an alpha-linux
> > that's why it is not working. I also emailed my complete configuration
> > to the mailing list before and they told me that there is nothing wrong
> > with my configuration.
> >
> > I will really appreciate your help on this matter.
> >
> > Thanks very much,
> > Gerson
> >
> > Bernward Platz wrote:
> > > Hi Gerson,
> > >
> > > I don't know if I understand you right. So I try to reproduce your
> > > scenario: I built two linux-nodes with one processor each.
> > > I have to queues tp_pri_A and tp_pri_B where tp_pri_A is a high-priority
> > > queue and tp_pri_B is a low-priority queue. Then
> > >
> > > qsub -q pri_B nodes=1:linux
> > >
> > > Job id           Name             User             Time Use S Queue
> > > ---------------- ---------------- ---------------- -------- - -----
> > > 781.gordon         STDIN            platz                   0 R tp_pri_B
> > >
> > >
> > > qsub -q pri_A nodes=2:linux
> > >
> > > Job id           Name             User             Time Use S Queue
> > > ---------------- ---------------- ---------------- -------- - -----
> > > 781.gordon         STDIN            platz            00:00:00 S tp_pri_B
> > > 782.gordon         STDIN            platz                   0 Q tp_pri_A
> > >
> > > After a short time:
> > >
> > > Job id           Name             User             Time Use S Queue
> > > ---------------- ---------------- ---------------- -------- - -----
> > > 781.gordon         STDIN            platz            00:00:00 S tp_pri_B
> > > 782.gordon         STDIN            platz            00:00:00 R tp_pri_A
> > >
> > > In your example the second job is not dispatched, right?
> > >
> > >
> > > Regards,
> > >
> > > Bernward
> > >
> > > On Monday 02 August 2004 07:54, Gerson Galang wrote:
> > >>Hi,
> > >>
> > >>I tried your patch and it worked on our test cluster. However, I need to
> > >>manually run the job using the qrun command because even if the server
> > >>already frees up the nodes with suspended jobs in them, the next job in
> > >>the queue still doesn't get executed. This only happens when the number
> > >>of requested nodes is more than TOTAL_NUM_OF_COMPUTE_NODES -
> > >>NODES_WITH_SUSPENDED_JOBS. Here's the result of doing a "checkjob
> > >><jobid>" on the next job in the queue that doesn't automatically get
> > >>executed.
> > >>
> > >>...
> > >>Reservation '815' (00:58:44 -> 1:58:44  Duration: 1:00:00)
> > >>PE:  6.00  StartPriority:  1
> > >>job cannot run in partition DEFAULT (idle procs do not meet requirements
> > >>
> > >>: 0 of 6 procs found)
> > >>
> > >>idle procs:   6  feasible procs:   0
> > >>Rejection Reasons: [ReserveTime  :    6]
> > >>
> > >>Does anybody else have a patch to set the state of the processes to idle?
> > >>
> > >>Another thing that we have noticed here when we suspend jobs is that a
> > >>job's walltime  still continues to decrease even if that job has already
> > >>been suspended. Is there a way of stopping the wall clock time of a
> > >>suspended job?
> > >>
> > >>Thanks,
> > >>Gerson
> > >>
> > >>Bernward Platz wrote:
> > >>>I think this is a problem in req_signal.c, because
> > >>>when a job is suspended the nodes allocated by the job are not released.
> > >>>I wrote a short patch to solve this problem. The important call in
> > >>>req_signal.c is "free_nodes".
> > >>>The path is not well tested yet. But I used the patch several times
> > >>>without problems.
> > >>>
> > >>>Regards
> > >>>
> > >>>Bernward
> > >>>
> > >>>
> > >>>
> > >>>diff -urN -X exclude torque-1.0.1.org/src/server/req_signal.c
> > >>>torque-1.0.1/ src/server/req_signal.c
> > >>>--- torque-1.0.1.org/src/server/req_signal.c    2004-02-13
> > >>>20:01:00.000000000 +0100
> > >>>+++ torque-1.0.1/src/server/req_signal.c        2004-03-20
> > >>>10:01:13.000000000 +0100
> > >>>@@ -206,8 +206,10 @@
> > >>>                        pjob->ji_qs.ji_svrflags |= JOB_SVFLG_Suspend;
> > >>>                        set_statechar(pjob);
> > >>>                        job_save(pjob, SAVEJOB_QUICK);
> > >>>+                        free_nodes(pjob);
> > >>>                } else if (strcmp(preq->rq_ind.rq_signal.rq_signame,
> > >>>                           SIG_RESUME) == 0) {
> > >>>+                        set_old_nodes(pjob);
> > >>>                        pjob->ji_qs.ji_svrflags &= ~JOB_SVFLG_Suspend;
> > >>>                        set_statechar(pjob);
> > >>>                        job_save(pjob, SAVEJOB_QUICK);
> > >>>
> > >>>On Wednesday 28 July 2004 10:50, Sébastien Georget wrote:
> > >>>>Hi,
> > >>>>
> > >>>>  I am trying to use maui/torque suspend feature. Right now I can
> > >>>>suspend/resume jobs using qsig -s suspend/resumeJOBID or mjobctl -s/-r
> > >>>>JOBID.
> > >>>>The problem is that the nodes where the suspended job runs are still in
> > >>>>the state 'job-exclusive' and cannot be used to submit new jobs. I
> > >>>>wonder which one of maui or torque has a faulty behaviour here.
> > >>>>Should torque change the state of the node to free when the job is
> > >>>>suspended, or should it be maui ? Can it be configured somewhere ?
> > >>>>
> > >>>>thx,
> > >>>>Sébastien
> > >>
> > >>_______________________________________________
> > >>mauiusers mailing list
> > >>mauiusers at supercluster.org
> > >>http://supercluster.org/mailman/listinfo/mauiusers
-- 
Douglas
Cluster Resources, INC.



More information about the mauiusers mailing list