[Mauiusers] Re: [torqueusers] Problems with suspend/resume
Fedele Stabile
fedele at fis.unical.it
Thu Apr 14 11:03:08 MDT 2005
My req_signal.c appears to be different from that expected by the patch.
I have installed torque 1.2.0
Can you help me?
Fedele
Il mer, 2005-04-13 alle 23:57, Gerson Galang ha scritto:
> Benward Platz has written a patch to req_signal.c which fixes this problem.
>
> http://www.supercluster.org/pipermail/mauiusers/2004-July/001284.html
>
>
>
> Fedele Stabile wrote:
> > Hello,
> > on my cluster is installed torque-1.2.0p2 e maui-3.2.6p11.
> > In maui.cfg i have
> > # maui.cfg 3.2p8
> > SERVERHOST linuxlab.fis.unical.it
> > ADMIN1 root
> > RMCFG[base] TYPE=PBS
> > RMCFG[base] SUSPENDSIG=20
> > AMCFG[bank] TYPE=NONE
> > RMPOLLINTERVAL 00:00:30
> > SERVERPORT 42559
> > SERVERMODE NORMAL
> > LOGFILE maui.log
> > LOGFILEMAXSIZE 10000000
> > LOGLEVEL 3
> > QUEUETIMEWEIGHT 1
> > BACKFILLPOLICY FIRSTFIT
> > RESERVATIONPOLICY CURRENTHIGHEST
> > NODEALLOCATIONPOLICY MINRESOURCE
> > PREEMPTIONPOLICY SUSPEND
> > QOSCFG[DEFAULT] QFLAGS=PREEMPTOR
> > QOSCFG[DEFAULT] QFLAGS=PREEMPTEE
> >
> > Running an MPI job, if i suspend it with qsig -s suspend it will be
> > suspended but pbsnodes -a shows nodes in a state job-exclusive.
> > The output of chekjob is
> > checking job 20
> >
> > State: Suspended EState: Running
> > Creds: user:fedele group:users class:batch qos:DEFAULT
> > WallTime: 00:00:31 of 1:00:00
> > Suspended Wall Time: 00:01:32
> > SubmitTime: Tue Apr 12 11:08:29
> > (Time Queued Total: 1:00:08 Eligible: 00:58:04)
> >
> > Total Tasks: 16
> >
> > Req[0] TaskCount: 16 Partition: DEFAULT
> > Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> > Opsys: [NONE] Arch: [NONE] Features: [NONE]
> > NodeCount: 16
> > Allocated Nodes:
> > [pc16:1][pc15:1][pc14:1][pc13:1]
> > [pc12:1][pc11:1][pc10:1][pc9:1]
> > [pc8:1][pc7:1][pc6:1][pc5:1]
> > [pc4:1][pc3:1][pc2:1][pc1:1]
> >
> >
> >
> > IWD: [NONE] Executable: [NONE]
> > Bypass: 0 StartCount: 1
> > PartitionMask: [ALL]
> > Flags: RESTARTABLE PREEMPTEE
> > Attr: PREEMPTEE
> >
> > EState 'Running' does not match current state 'Suspended'
> > Reservation '20' (-00:02:04 -> 00:57:56 Duration: 1:00:00)
> > PE: 16.00 StartPriority: 58
> > cannot select job 20 for partition DEFAULT (non-idle expected state
> > 'Running')
> >
> > So job is not suspended, can you help me?
> > Thank you, Fedele
> >
> >
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://supercluster.org/mailman/listinfo/torqueusers
> >
More information about the mauiusers
mailing list