[gold-users] are stale reservations normal

Brock Palen brockp at umich.edu
Thu Mar 18 11:13:29 MDT 2010


It looks like moab is getting in a odd state and not issuing the charge,

glsres -I
4380349 3282258   345600 2010-03-17 11:39:13 2010-03-18 11:49:13  
4427406 aducoin  ylyoung nyx     2

In the moab logs I only see this Alert over and over:

03/17 11:44:59  ALERT:    job '3282258' has been in state 'Running'  
for 306 seconds.  node 'nyx0900' is in state 'Running'  (job '3282258'  
will be cancelled
)
03/17 11:44:59  MSysRegEvent(JOBCORRUPTION:  job '3282258' (user  
aducoin) has been in state 'Running' for 306 seconds.  node 'nyx0900'  
is in state 'Running'
   (job '3282258' will be cancelled)

03/17 15:01:23  MJobProcessCompleted(3282258)
03/17 15:01:23  MJobProcessTVariables(3282258)
03/17 15:01:23  MAMAllocJDebit(A,3282258,SC,EMsg)
03/17 15:01:23  MJobSendFB(3282258)
03/17 15:01:23  MSysLaunchAction(ASList,)
03/17 15:01:23  INFO:     job usage sent for job '3282258'
03/17 15:01:23  ALERT:    job '           3282258' has invalid system  
queue time (SQ: 1268852118 > ST: 1268840355)
03/17 15:01:23  INFO:     job '           3282258' completed.   
QueueTime:      0  RunTime:  11960  Accuracy: 13.84  XFactor:  0.14
03/17 15:01:23  INFO:     overall statistics.  Accuracy:   nan   
XFactor:   inf
03/17 15:01:23  INFO:     job '3282258' completed  X: 0.138426  T:  
11960  PS: 47840  A: 0.138426 (RM: nyx/nyx)
03/17 15:01:23  MReqCreate(3282258,SrcRQ,DstRQ,TRUE)
03/17 15:01:23  INFO:     added completed job '3282258', Job  
Completion Time Wed Mar 17 14:58:35

03/17 15:01:23  INFO:     node 'nyx0900' released from job 3282258
03/17 15:01:23  MJobRemove(3282258)
03/17 15:01:23  MJobDestroyVM(3282258,EMsg)
03/17 15:01:23  MRsvDestroy(3282258,TRUE,TRUE)
03/17 15:01:23  MRsvDestroyCredLock(3282258)
03/17 15:01:23  MJobDestroy(3282258)

03/17 15:06:07  MReqCreate(3282258,SrcRQ,DstRQ,TRUE)
03/17 15:06:07  INFO:     added completed job '3282258', Job  
Completion Time Wed Mar 17 14:58:35
03/17 15:06:07  MJobDestroy(3282258)


We run thousands a job a day so most jobs are not showing this  
behavior and get charged.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985



On Mar 17, 2010, at 2:11 PM, Scott Jackson wrote:

> Brock,
>
> I might expect a few here and there, but on this scale I would say  
> there is something pretty wrong.
>
> I would recommend using glsres -I to get a list of ones that have  
> expired but were not removed. Then look for these in the goldd.log  
> to see if Charges were issued for them. You may find that Errors  
> occurred, or you may find that Moab never sent the charge request,  
> or you may find that there is a bug in Gold where it is charging but  
> the reservation is not getting removed (naturally, this is doubtful:).
>
> Scott
>
>
> Brock Palen wrote:
>> We tend to accumulate stale reservations (things that get deleted  
>> with  grmres -I)
>>
>> We have setup a cron job to run grmres -I every night and deletes   
>> between 100 and 500 every day.  Should this be happening?  What  
>> would  be causing this?
>>
>> Thanks
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>>
>>
>>
>> _______________________________________________
>> gold-users mailing list
>> gold-users at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/gold-users
>>
>
>
>



More information about the gold-users mailing list