[gold-users] are stale reservations normal
Brock Palen
brockp at umich.edu
Thu Mar 18 11:13:29 MDT 2010
It looks like moab is getting in a odd state and not issuing the charge,
glsres -I
4380349 3282258 345600 2010-03-17 11:39:13 2010-03-18 11:49:13
4427406 aducoin ylyoung nyx 2
In the moab logs I only see this Alert over and over:
03/17 11:44:59 ALERT: job '3282258' has been in state 'Running'
for 306 seconds. node 'nyx0900' is in state 'Running' (job '3282258'
will be cancelled
)
03/17 11:44:59 MSysRegEvent(JOBCORRUPTION: job '3282258' (user
aducoin) has been in state 'Running' for 306 seconds. node 'nyx0900'
is in state 'Running'
(job '3282258' will be cancelled)
03/17 15:01:23 MJobProcessCompleted(3282258)
03/17 15:01:23 MJobProcessTVariables(3282258)
03/17 15:01:23 MAMAllocJDebit(A,3282258,SC,EMsg)
03/17 15:01:23 MJobSendFB(3282258)
03/17 15:01:23 MSysLaunchAction(ASList,)
03/17 15:01:23 INFO: job usage sent for job '3282258'
03/17 15:01:23 ALERT: job ' 3282258' has invalid system
queue time (SQ: 1268852118 > ST: 1268840355)
03/17 15:01:23 INFO: job ' 3282258' completed.
QueueTime: 0 RunTime: 11960 Accuracy: 13.84 XFactor: 0.14
03/17 15:01:23 INFO: overall statistics. Accuracy: nan
XFactor: inf
03/17 15:01:23 INFO: job '3282258' completed X: 0.138426 T:
11960 PS: 47840 A: 0.138426 (RM: nyx/nyx)
03/17 15:01:23 MReqCreate(3282258,SrcRQ,DstRQ,TRUE)
03/17 15:01:23 INFO: added completed job '3282258', Job
Completion Time Wed Mar 17 14:58:35
03/17 15:01:23 INFO: node 'nyx0900' released from job 3282258
03/17 15:01:23 MJobRemove(3282258)
03/17 15:01:23 MJobDestroyVM(3282258,EMsg)
03/17 15:01:23 MRsvDestroy(3282258,TRUE,TRUE)
03/17 15:01:23 MRsvDestroyCredLock(3282258)
03/17 15:01:23 MJobDestroy(3282258)
03/17 15:06:07 MReqCreate(3282258,SrcRQ,DstRQ,TRUE)
03/17 15:06:07 INFO: added completed job '3282258', Job
Completion Time Wed Mar 17 14:58:35
03/17 15:06:07 MJobDestroy(3282258)
We run thousands a job a day so most jobs are not showing this
behavior and get charged.
Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
brockp at umich.edu
(734)936-1985
On Mar 17, 2010, at 2:11 PM, Scott Jackson wrote:
> Brock,
>
> I might expect a few here and there, but on this scale I would say
> there is something pretty wrong.
>
> I would recommend using glsres -I to get a list of ones that have
> expired but were not removed. Then look for these in the goldd.log
> to see if Charges were issued for them. You may find that Errors
> occurred, or you may find that Moab never sent the charge request,
> or you may find that there is a bug in Gold where it is charging but
> the reservation is not getting removed (naturally, this is doubtful:).
>
> Scott
>
>
> Brock Palen wrote:
>> We tend to accumulate stale reservations (things that get deleted
>> with grmres -I)
>>
>> We have setup a cron job to run grmres -I every night and deletes
>> between 100 and 500 every day. Should this be happening? What
>> would be causing this?
>>
>> Thanks
>>
>> Brock Palen
>> www.umich.edu/~brockp
>> Center for Advanced Computing
>> brockp at umich.edu
>> (734)936-1985
>>
>>
>>
>> _______________________________________________
>> gold-users mailing list
>> gold-users at supercluster.org
>> http://www.supercluster.org/mailman/listinfo/gold-users
>>
>
>
>
More information about the gold-users
mailing list