[torqueusers] Problems with suspend/resume
Gerson Galang
gerson.sapac at gawab.com
Wed Apr 13 17:57:11 MDT 2005
Benward Platz has written a patch to req_signal.c which fixes this problem.
http://www.supercluster.org/pipermail/mauiusers/2004-July/001284.html
Fedele Stabile wrote:
> Hello,
> on my cluster is installed torque-1.2.0p2 e maui-3.2.6p11.
> In maui.cfg i have
> # maui.cfg 3.2p8
> SERVERHOST linuxlab.fis.unical.it
> ADMIN1 root
> RMCFG[base] TYPE=PBS
> RMCFG[base] SUSPENDSIG=20
> AMCFG[bank] TYPE=NONE
> RMPOLLINTERVAL 00:00:30
> SERVERPORT 42559
> SERVERMODE NORMAL
> LOGFILE maui.log
> LOGFILEMAXSIZE 10000000
> LOGLEVEL 3
> QUEUETIMEWEIGHT 1
> BACKFILLPOLICY FIRSTFIT
> RESERVATIONPOLICY CURRENTHIGHEST
> NODEALLOCATIONPOLICY MINRESOURCE
> PREEMPTIONPOLICY SUSPEND
> QOSCFG[DEFAULT] QFLAGS=PREEMPTOR
> QOSCFG[DEFAULT] QFLAGS=PREEMPTEE
>
> Running an MPI job, if i suspend it with qsig -s suspend it will be
> suspended but pbsnodes -a shows nodes in a state job-exclusive.
> The output of chekjob is
> checking job 20
>
> State: Suspended EState: Running
> Creds: user:fedele group:users class:batch qos:DEFAULT
> WallTime: 00:00:31 of 1:00:00
> Suspended Wall Time: 00:01:32
> SubmitTime: Tue Apr 12 11:08:29
> (Time Queued Total: 1:00:08 Eligible: 00:58:04)
>
> Total Tasks: 16
>
> Req[0] TaskCount: 16 Partition: DEFAULT
> Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0
> Opsys: [NONE] Arch: [NONE] Features: [NONE]
> NodeCount: 16
> Allocated Nodes:
> [pc16:1][pc15:1][pc14:1][pc13:1]
> [pc12:1][pc11:1][pc10:1][pc9:1]
> [pc8:1][pc7:1][pc6:1][pc5:1]
> [pc4:1][pc3:1][pc2:1][pc1:1]
>
>
>
> IWD: [NONE] Executable: [NONE]
> Bypass: 0 StartCount: 1
> PartitionMask: [ALL]
> Flags: RESTARTABLE PREEMPTEE
> Attr: PREEMPTEE
>
> EState 'Running' does not match current state 'Suspended'
> Reservation '20' (-00:02:04 -> 00:57:56 Duration: 1:00:00)
> PE: 16.00 StartPriority: 58
> cannot select job 20 for partition DEFAULT (non-idle expected state
> 'Running')
>
> So job is not suspended, can you help me?
> Thank you, Fedele
>
>
> _______________________________________________
> torqueusers mailing list
> torqueusers at supercluster.org
> http://supercluster.org/mailman/listinfo/torqueusers
>
More information about the torqueusers
mailing list