[Mauiusers] setres - problem draining batch queues

Paul Szczypka paul.szczypka at gmail.com
Mon Nov 10 13:05:50 MST 2008


Hello,

I'm having a problem draining my cluster's batch queues using setres.
Adapting one of the examples given in the clusterresources documentation I
attempt to reserve every node and processor in my cluster during our
scheduled downtime:

# setres  -s 18:00:00_11/18 -e 08:00:00_11/19 -n electricityDowntime ALL


I then check the reservations and see that the downtime is there and is
applied to all processors along with a few other reservations for
pre-existing jobs.

To test the reservation I submit a job which requests an excessive amount of
wallclock time (Job 918). It's scheduled to start when the downtime finishes
which is what I expect:

[root at lphesrv1 spool]# showres
Reservations

ReservationID       Type S       Start         End    Duration    N/P
StartTime

909                  Job R    -1:21:00 83:06:39:00 83:08:00:00    1/1  Mon
Nov 10 15:40:32
912                  Job R    -1:09:05  4:12:50:55  4:14:00:00    1/1  Mon
Nov 10 15:52:27
913                  Job R   -00:49:57  4:13:10:03  4:14:00:00    1/1  Mon
Nov 10 16:11:35
918                  Job I  8:14:58:28 91:22:58:28 83:08:00:00    1/1  Wed
Nov 19 08:00:00
electricityDowntime.0  User -  8:00:58:28  8:14:58:28    14:00:00 60/480
Tue Nov 18 18:00:00

17 reservations located


Unfortunately, upon checking the reservations ~ 10minutes later I find that
Job 918 has started despite the reservations overlapping:


[root at lphesrv1 spool]# showres
Reservations

ReservationID       Type S       Start         End    Duration    N/P
StartTime

909                  Job R    -1:27:25 83:06:32:35 83:08:00:00    1/1  Mon
Nov 10 15:40:32
912                  Job R    -1:15:30  4:12:44:30  4:14:00:00    1/1  Mon
Nov 10 15:52:27
913                  Job R   -00:56:22  4:13:03:38  4:14:00:00    1/1  Mon
Nov 10 16:11:35
918                  Job R   -00:06:15 83:07:53:45 83:08:00:00    1/1  Mon
Nov 10 17:01:42
electricityDowntime.0  User -  8:00:52:03  8:14:52:03    14:00:00 60/480
Tue Nov 18 18:00:00

17 reservations located


Can anyone help me debug this/explain this behaviour? I can't find anything
in my maui dir logs and only:

11/10/2008 17:01:42;0100;PBS_Server;Req;;Type StatusJob request received
from root at lphesrv1.epfl.ch, sock=14
11/10/2008 17:01:42;0100;PBS_Server;Req;;Type ModifyJob request received
from root at lphesrv1.epfl.ch, sock=14
11/10/2008 17:01:42;0008;PBS_Server;Job;918.lphesrv1.epfl.ch;Job Modified at
request of root at lphesrv1.epfl.ch
11/10/2008 17:01:42;0100;PBS_Server;Req;;Type RunJob request received from
root at lphesrv1.epfl.ch, sock=14
11/10/2008 17:01:42;0008;PBS_Server;Job;918.lphesrv1.epfl.ch;Job Run at
request of root at lphesrv1.epfl.ch

in the pbs logs.

I'm using:

[root at lphesrv1 spool]# qmgr -c "p s"|grep pbs_ver
set server pbs_version = 2.3.0-snap.200801151629
[root at lphesrv1 spool]# setres -v
maui client version 3.2.6p20

Will post my maui.cfg if relevant.

Thanks,

Paul.

-- 
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
 Paul Szczypka, EPFL SB IPEP LPHE1, BSP 614, CH-1015 Lausanne
 paul.szczypka at cern.ch                   Tel: +41 21 69 30495
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
   Please avoid sending me Word or PowerPoint attachments.
  See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.supercluster.org/pipermail/mauiusers/attachments/20081110/a00b2664/attachment.html


More information about the mauiusers mailing list