[torqueusers] Problems with suspend/resume

Fedele Stabile fedele at fis.unical.it
Wed Apr 13 04:24:23 MDT 2005


Hello,
on my cluster is installed torque-1.2.0p2 e maui-3.2.6p11.
In maui.cfg i have
# maui.cfg 3.2p8
SERVERHOST            linuxlab.fis.unical.it
ADMIN1                root
RMCFG[base]  TYPE=PBS
RMCFG[base] SUSPENDSIG=20
AMCFG[bank]  TYPE=NONE
RMPOLLINTERVAL        00:00:30 
SERVERPORT            42559
SERVERMODE            NORMAL
LOGFILE               maui.log
LOGFILEMAXSIZE        10000000
LOGLEVEL              3
QUEUETIMEWEIGHT       1
BACKFILLPOLICY        FIRSTFIT
RESERVATIONPOLICY     CURRENTHIGHEST
NODEALLOCATIONPOLICY  MINRESOURCE
PREEMPTIONPOLICY SUSPEND
QOSCFG[DEFAULT]  QFLAGS=PREEMPTOR
QOSCFG[DEFAULT]  QFLAGS=PREEMPTEE

Running an MPI job, if i suspend it with qsig -s suspend it will be
suspended but pbsnodes -a shows nodes in a state job-exclusive.
The output of chekjob is 
checking job 20
 
State: Suspended  EState: Running
Creds:  user:fedele  group:users  class:batch  qos:DEFAULT
WallTime: 00:00:31 of 1:00:00
Suspended Wall Time: 00:01:32
SubmitTime: Tue Apr 12 11:08:29
  (Time Queued  Total: 1:00:08  Eligible: 00:58:04)
 
Total Tasks: 16
 
Req[0]  TaskCount: 16  Partition: DEFAULT
Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
NodeCount: 16
Allocated Nodes:
[pc16:1][pc15:1][pc14:1][pc13:1]
[pc12:1][pc11:1][pc10:1][pc9:1]
[pc8:1][pc7:1][pc6:1][pc5:1]
[pc4:1][pc3:1][pc2:1][pc1:1]
 
 
 
IWD: [NONE]  Executable:  [NONE]
Bypass: 0  StartCount: 1
PartitionMask: [ALL]
Flags:       RESTARTABLE PREEMPTEE
Attr:        PREEMPTEE
 
EState 'Running' does not match current state 'Suspended'
Reservation '20' (-00:02:04 -> 00:57:56  Duration: 1:00:00)
PE:  16.00  StartPriority:  58
cannot select job 20 for partition DEFAULT (non-idle expected state
'Running')
 
So job is not suspended, can you help me?
Thank you, Fedele




More information about the torqueusers mailing list