[gold-users] Error message about AM in moab

Scott Jackson scottmo at clusterresources.com
Wed Apr 15 16:47:39 MDT 2009


Brock,

As I mentioned earlier, I am catching up from old filtered-out emails.

Brock Palen wrote:
> We are running a moab 5.3 beta, jobs get charged but when a job was 
> preempted i see the following error in the moab logs:
>
> 09/25 16:44:49  INFO:     preempting jobs to allow job 1532527 to 
> start - required resources  T: 2  N: 0  P: 2
> 09/25 16:44:49  INFO:     job 1532527 preempting job 1529489 
> (statemtime: 0) (preempted this iteration: 1)
> 09/25 16:44:55  INFO:     packet sent (351 bytes of 351)
> 09/25 16:44:55  INFO:     command sent to server
> 09/25 16:44:55  INFO:     message sent: '<XML>'
> 09/25 16:44:55  INFO:     S3 status element reports request failed - 
> root is not authorized to perform this function (Reservation Delete)
> 09/25 16:44:55  ALERT:    request failed with status code 444 (root is 
> not authorized to perform this function (Reservation Delete))
> 09/25 16:44:55  ERROR:    cannot receive response from 
> allocation-manager server 'cac-admin01.engin.umich.edu':7112
> 09/25 16:44:55  INFO:     command response 'NULL'
> 09/25 16:44:55  ALERT:    no job data available
> 09/25 16:44:55  ALERT:    unexpected AM error - server rejected 
> request with status code 444 - root is not authorized to perform this 
> function (Reservation Delete)
> 09/25 16:44:55  ALERT:    cannot destroy reservation allocation for 
> rsv 1529489 - request refused
> 09/25 16:44:55  ERROR:    cannot cancel account reservation for job 
> '1529489'
> 09/25 16:44:55  INFO:     tasks located for job 1532527:  2 of 2 
> required (6 feasible)
> 09/25 16:44:55  INFO:     tasks located for job 1532527:  2 of 2 
> required (6 feasible)
> 09/25 16:44:55  INFO:     packet sent (629 bytes of 629)
> 09/25 16:44:55  INFO:     command sent to server
> 09/25 16:44:55  INFO:     message sent: '<XML>'
> 09/25 16:44:57  INFO:     response received from server
> 09/25 16:44:57  INFO:     response received: '<?xml version="1.0" 
> encoding="UTF-8"?>
>
> What is the side effect of this?  Looks like a jobs reserved 
> allocation sits around for forever? If so how do we clean those up?
>

The side-effect would be that these reservations would continue to have 
a hold against the allocation until they expired (ran past their end 
time), at which time they would cease to count against the allocation.

The remedy for this is to add Reservation Delete as one of the Scheduler 
actions:

$ goldsh RoleAction Create Role=Scheduler Object=Reservation Name=Delete
Successfully created 1 RoleAction

I will fix this in the Gold version 2.1.8.0 release.

After the reservations pass their end time they will become inactive (or 
stale). You can list all inactive reservations with the command `glsres 
-I`. You can delete all inactive reservations with `grmres -I`. 
Immediately after Moab fails to remove a reservation, you can have a 
reservation continue to impinge against the allocation until it expires. 
This can create some problems if the user is near the end of their 
allocation. You would just have to manually compare the list of 
supposedly active reservations `glsres -A` with the running jobs `showq 
-r`. You could delete any reservations that were not associated with 
running jobs `glsres <jobid>`.

Thanks,

Scott

$ svn diff
Index: bank.gold.in
===================================================================
--- bank.gold.in        (revision 108)
+++ bank.gold.in        (working copy)
@@ -303,6 +303,7 @@
 RoleAction Create Role=Scheduler Name=Charge Object=Job NoRefresh:=True
 RoleAction Create Role=Scheduler Name=Quote Object=Job NoRefresh:=True
 RoleAction Create Role=Scheduler Name=Reserve Object=Job NoRefresh:=True
+RoleAction Create Role=Scheduler Name=Delete Object=Reservation 
NoRefresh:=True
 RoleUser Create Role=Scheduler Name=root NoRefresh:=True
 RoleUser Create Role=OVERRIDE Name=ANY NoRefresh:=True
 RoleAction Create Role=OVERRIDE Name=Balance Object=Account NoRefresh:=True
Index: CHANGES
===================================================================
--- CHANGES     (revision 108)
+++ CHANGES     (working copy)
@@ -23,6 +23,7 @@
     to 1/50th the time for systems with many deleted reservations.
     create index  g_reservation_acct_where_idx ON g_reservation_allocation
      (g_account) WHERE g_deleted!='True';
+  Added Reservation Delete to the default Scheduler role.
 
 Fix Release 2.1.7.1
 
Index: bank.sql.in
===================================================================
--- bank.sql.in (revision 108)
+++ bank.sql.in (working copy)
@@ -1738,6 +1738,7 @@
 INSERT INTO g_role_action VALUES ('Scheduler', 'Job', 'Charge', 'ANY', 
'False', @NOW@, @NOW@, 254, 254);
 INSERT INTO g_role_action VALUES ('Scheduler', 'Job', 'Quote', 'ANY', 
'False', @NOW@, @NOW@, 255, 255);
 INSERT INTO g_role_action VALUES ('Scheduler', 'Job', 'Reserve', 'ANY', 
'False', @NOW@, @NOW@, 256, 256);
+INSERT INTO g_role_action VALUES ('Scheduler', 'Reservation', 'Delete', 
'ANY', 'False', @NOW@, @NOW@, 257, 257);
 INSERT INTO g_role_action VALUES ('OVERRIDE', 'Account', 'Balance', 
'ANY', 'False', @NOW@, @NOW@, 258, 258);
 
 INSERT INTO g_role_user VALUES ('SystemAdmin', '@USER@', 'False', 
@NOW@, @NOW@, 0, 0);

> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> brockp at umich.edu
> (734)936-1985
>
>
>
> _______________________________________________
> gold-users mailing list
> gold-users at supercluster.org
> http://www.supercluster.org/mailman/listinfo/gold-users



More information about the gold-users mailing list