[Mauiusers] suspend / resume

Bernward Platz Bernward.Platz at clucon.de
Tue Aug 3 02:48:23 MDT 2004


The problem of using qsig ist, that qsig does not delete the reservation made 
by maui. If you use a PREEMPTEE/PREEMPTOR-configuration in mau, maui suspends 
the job and it should
work. Or use 

mjobctl -s <jobid>

to suspend the job manually. mjobctl deletes the reservation of the suspended 
job, so maui can start the new job.

For better understanding here the output of a job, suspended by qsig
(checkjob <jobid>):
----------------------------------------
checking job 805

State: Suspended  (User: platz  Group: users)
WallTime: 0:00:30 of 99:23:59:59
Suspended Wall Time: 0:00:20
SubmitTime: Tue Aug  3 08:07:28
  (Time Queued  Total: 0:00:53  Eligible: 0:00:01)

Total Tasks: 2

Req[0]  TaskCount: 2  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Class: [tp_pri_A 1]  Features: [NONE]


IWD: [NONE]  Executable:  [NONE]
QOS: hi  Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE PREEMPTOR

EState 'Running' does not match current state 'Suspended'
!!!!!!!!!!!!!!!!  Reservation '805' (-0:00:52 -> 99:23:59:07  Duration: 
99:23:59:59) !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
PE:  2.00  StartPriority:  1000
cannot select job 805 for partition DEFAULT (non-idle expected state 
'Running')
---------------------------




And this is the output of a job suspended by maui:

checking job 803

State: Suspended
Creds:  user:platz  group:users  class:tp_pri_A  qos:hi
WallTime: 00:00:31 of 99:23:59:59
Suspended Wall Time: 00:00:10
SubmitTime: Tue Aug  3 10:43:04
  (Time Queued  Total: 00:01:13  Eligible: 00:00:00)

StartDate: 00:00:49  Tue Aug  3 10:45:06
Total Tasks: 2

Req[0]  TaskCount: 2  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [linux]
Allocated Nodes:
[node02:1][gordon:1]


IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 0
PartitionMask: [ALL]
Flags:       RESTARTABLE PREEMPTOR

PE:  2.00  StartPriority:  1001
cannot select job 810 for partition DEFAULT (startdate in '00:00:49')





On Tuesday 03 August 2004 02:17, you wrote:
> Thanks for replying Bernward.
>
> We still haven't gotten maui to automatically suspend and resume jobs.
> How we do it with our cluster is we just manually suspend jobs (qsig -s
> suspend <jobid>) and experiment if the next job in the queue will really
> be run.
>
> Using your patch, I tried to submit a parallel job (which uses 6
> processors) on our 6-node/processor cluster which I suspended after the
> job runs for a minute. I then submit another job which requests for 6
> processors as well. The new job will just stay in the queue, will go to
> a defered state, and later on be put to an idle state. "pbsnodes -a"
> tells me that the nodes state are already free but checkjob <jobid>
> tells me that the job has been put on an idle state because there are no
> idle processors which can satisfy the job requirements. Although the job
> doesn't get executed automatically, I still can force it to run by using
> the qrun command on that particular job.
>
> Have you applied any other patch (for the suspend and resume to work) in
> addition to the one that you sent to the mailing list? At the moment, I
> am using two patches: the torque and mpiexec patch which properly
> suspends parallel jobs and the one that you sent which frees up the
> nodes with the suspended jobs. I got the first patch from Sebastien.
>
> Can you also tell me the architecture and OS of the machine you are
> running your jobs on? We are trying to test maui on an alpha-linux
> cluster and we've already tried everything that we can think of to get
> suspend and resume to work but we are still unsuccessful with it. We are
> thinking that it might be that we are running our job on an alpha-linux
> that's why it is not working. I also emailed my complete configuration
> to the mailing list before and they told me that there is nothing wrong
> with my configuration.
>
> I will really appreciate your help on this matter.
>
> Thanks very much,
> Gerson
>
> Bernward Platz wrote:
> > Hi Gerson,
> >
> > I don't know if I understand you right. So I try to reproduce your
> > scenario: I built two linux-nodes with one processor each.
> > I have to queues tp_pri_A and tp_pri_B where tp_pri_A is a high-priority
> > queue and tp_pri_B is a low-priority queue. Then
> >
> > qsub -q pri_B nodes=1:linux
> >
> > Job id           Name             User             Time Use S Queue
> > ---------------- ---------------- ---------------- -------- - -----
> > 781.gordon         STDIN            platz                   0 R tp_pri_B
> >
> >
> > qsub -q pri_A nodes=2:linux
> >
> > Job id           Name             User             Time Use S Queue
> > ---------------- ---------------- ---------------- -------- - -----
> > 781.gordon         STDIN            platz            00:00:00 S tp_pri_B
> > 782.gordon         STDIN            platz                   0 Q tp_pri_A
> >
> > After a short time:
> >
> > Job id           Name             User             Time Use S Queue
> > ---------------- ---------------- ---------------- -------- - -----
> > 781.gordon         STDIN            platz            00:00:00 S tp_pri_B
> > 782.gordon         STDIN            platz            00:00:00 R tp_pri_A
> >
> > In your example the second job is not dispatched, right?
> >
> >
> > Regards,
> >
> > Bernward
> >
> > On Monday 02 August 2004 07:54, Gerson Galang wrote:
> >>Hi,
> >>
> >>I tried your patch and it worked on our test cluster. However, I need to
> >>manually run the job using the qrun command because even if the server
> >>already frees up the nodes with suspended jobs in them, the next job in
> >>the queue still doesn't get executed. This only happens when the number
> >>of requested nodes is more than TOTAL_NUM_OF_COMPUTE_NODES -
> >>NODES_WITH_SUSPENDED_JOBS. Here's the result of doing a "checkjob
> >><jobid>" on the next job in the queue that doesn't automatically get
> >>executed.
> >>
> >>...
> >>Reservation '815' (00:58:44 -> 1:58:44  Duration: 1:00:00)
> >>PE:  6.00  StartPriority:  1
> >>job cannot run in partition DEFAULT (idle procs do not meet requirements
> >>
> >>: 0 of 6 procs found)
> >>
> >>idle procs:   6  feasible procs:   0
> >>Rejection Reasons: [ReserveTime  :    6]
> >>
> >>Does anybody else have a patch to set the state of the processes to idle?
> >>
> >>Another thing that we have noticed here when we suspend jobs is that a
> >>job's walltime  still continues to decrease even if that job has already
> >>been suspended. Is there a way of stopping the wall clock time of a
> >>suspended job?
> >>
> >>Thanks,
> >>Gerson
> >>
> >>Bernward Platz wrote:
> >>>I think this is a problem in req_signal.c, because
> >>>when a job is suspended the nodes allocated by the job are not released.
> >>>I wrote a short patch to solve this problem. The important call in
> >>>req_signal.c is "free_nodes".
> >>>The path is not well tested yet. But I used the patch several times
> >>>without problems.
> >>>
> >>>Regards
> >>>
> >>>Bernward
> >>>
> >>>
> >>>
> >>>diff -urN -X exclude torque-1.0.1.org/src/server/req_signal.c
> >>>torque-1.0.1/ src/server/req_signal.c
> >>>--- torque-1.0.1.org/src/server/req_signal.c    2004-02-13
> >>>20:01:00.000000000 +0100
> >>>+++ torque-1.0.1/src/server/req_signal.c        2004-03-20
> >>>10:01:13.000000000 +0100
> >>>@@ -206,8 +206,10 @@
> >>>                        pjob->ji_qs.ji_svrflags |= JOB_SVFLG_Suspend;
> >>>                        set_statechar(pjob);
> >>>                        job_save(pjob, SAVEJOB_QUICK);
> >>>+                        free_nodes(pjob);
> >>>                } else if (strcmp(preq->rq_ind.rq_signal.rq_signame,
> >>>                           SIG_RESUME) == 0) {
> >>>+                        set_old_nodes(pjob);
> >>>                        pjob->ji_qs.ji_svrflags &= ~JOB_SVFLG_Suspend;
> >>>                        set_statechar(pjob);
> >>>                        job_save(pjob, SAVEJOB_QUICK);
> >>>
> >>>On Wednesday 28 July 2004 10:50, Sébastien Georget wrote:
> >>>>Hi,
> >>>>
> >>>>  I am trying to use maui/torque suspend feature. Right now I can
> >>>>suspend/resume jobs using qsig -s suspend/resumeJOBID or mjobctl -s/-r
> >>>>JOBID.
> >>>>The problem is that the nodes where the suspended job runs are still in
> >>>>the state 'job-exclusive' and cannot be used to submit new jobs. I
> >>>>wonder which one of maui or torque has a faulty behaviour here.
> >>>>Should torque change the state of the node to free when the job is
> >>>>suspended, or should it be maui ? Can it be configured somewhere ?
> >>>>
> >>>>thx,
> >>>>Sébastien
> >>
> >>_______________________________________________
> >>mauiusers mailing list
> >>mauiusers at supercluster.org
> >>http://supercluster.org/mailman/listinfo/mauiusers

-- 
------------------------------------------------------------
clucon - cluster concepts 
Bernward Platz

Dipl.-Inform. Bernward Platz
Geisenbrunner Str. 72a
81475 Munich 
Phone:  +49 89 7593838
Fax:    +49 89 75201462 
Mobile: +49 175 5247883
Mail:   Bernward.Platz at clucon.de
Web:    http://www.clucon.de




More information about the mauiusers mailing list