[Mauiusers] Re: [torqueusers] Problems with suspend/resume

Fedele Stabile fedele at fis.unical.it
Thu Apr 14 11:03:08 MDT 2005


My req_signal.c appears to be different from that expected by the patch.
I have installed torque 1.2.0
Can you help me?
Fedele

Il mer, 2005-04-13 alle 23:57, Gerson Galang ha scritto:
> Benward Platz has written a patch to req_signal.c which fixes this problem.
> 
> http://www.supercluster.org/pipermail/mauiusers/2004-July/001284.html
> 
> 
> 
> Fedele Stabile wrote:
> > Hello,
> > on my cluster is installed torque-1.2.0p2 e maui-3.2.6p11.
> > In maui.cfg i have
> > # maui.cfg 3.2p8
> > SERVERHOST            linuxlab.fis.unical.it
> > ADMIN1                root
> > RMCFG[base]  TYPE=PBS
> > RMCFG[base] SUSPENDSIG=20
> > AMCFG[bank]  TYPE=NONE
> > RMPOLLINTERVAL        00:00:30 
> > SERVERPORT            42559
> > SERVERMODE            NORMAL
> > LOGFILE               maui.log
> > LOGFILEMAXSIZE        10000000
> > LOGLEVEL              3
> > QUEUETIMEWEIGHT       1
> > BACKFILLPOLICY        FIRSTFIT
> > RESERVATIONPOLICY     CURRENTHIGHEST
> > NODEALLOCATIONPOLICY  MINRESOURCE
> > PREEMPTIONPOLICY SUSPEND
> > QOSCFG[DEFAULT]  QFLAGS=PREEMPTOR
> > QOSCFG[DEFAULT]  QFLAGS=PREEMPTEE
> > 
> > Running an MPI job, if i suspend it with qsig -s suspend it will be
> > suspended but pbsnodes -a shows nodes in a state job-exclusive.
> > The output of chekjob is 
> > checking job 20
> >  
> > State: Suspended  EState: Running
> > Creds:  user:fedele  group:users  class:batch  qos:DEFAULT
> > WallTime: 00:00:31 of 1:00:00
> > Suspended Wall Time: 00:01:32
> > SubmitTime: Tue Apr 12 11:08:29
> >   (Time Queued  Total: 1:00:08  Eligible: 00:58:04)
> >  
> > Total Tasks: 16
> >  
> > Req[0]  TaskCount: 16  Partition: DEFAULT
> > Network: [NONE]  Memory >= 0  Disk >= 0  Swap >= 0
> > Opsys: [NONE]  Arch: [NONE]  Features: [NONE]
> > NodeCount: 16
> > Allocated Nodes:
> > [pc16:1][pc15:1][pc14:1][pc13:1]
> > [pc12:1][pc11:1][pc10:1][pc9:1]
> > [pc8:1][pc7:1][pc6:1][pc5:1]
> > [pc4:1][pc3:1][pc2:1][pc1:1]
> >  
> >  
> >  
> > IWD: [NONE]  Executable:  [NONE]
> > Bypass: 0  StartCount: 1
> > PartitionMask: [ALL]
> > Flags:       RESTARTABLE PREEMPTEE
> > Attr:        PREEMPTEE
> >  
> > EState 'Running' does not match current state 'Suspended'
> > Reservation '20' (-00:02:04 -> 00:57:56  Duration: 1:00:00)
> > PE:  16.00  StartPriority:  58
> > cannot select job 20 for partition DEFAULT (non-idle expected state
> > 'Running')
> >  
> > So job is not suspended, can you help me?
> > Thank you, Fedele
> > 
> > 
> > _______________________________________________
> > torqueusers mailing list
> > torqueusers at supercluster.org
> > http://supercluster.org/mailman/listinfo/torqueusers
> > 



More information about the mauiusers mailing list