[gold-users] Can Gold be told not to make a reservation for a job without sufficient quota ?

Scott Jackson scottmo at adaptivecomputing.com
Wed Oct 27 10:25:18 MDT 2010


Chris,

I have an idea but it will probably still take some digging. I believe I recall in some of my recent testing a situation where if a transaction failed, it would not rollback. This would explain what you are seeing. In practice, the reservation would start making reservations against its list of available allocations. If, after it ran out of active allocations, it finds that it has not fulfilled the full reservation amount, it fails with Insufficient funds and rolls back the transaction. I found that in one of my environments, the transaction rollbacks were silently failing and in such a case, we could be left with partial reservations.

So, what database are you using? What version of DBD::* are you using?

Thanks,

Scott


----- Original Message -----
> From: "Christopher Samuel" <samuel at unimelb.edu.au>
> To: gold-users at supercluster.org
> Sent: Tuesday, October 26, 2010 6:28:35 PM
> Subject: Re: [gold-users] Can Gold be told not to make a reservation for a job without sufficient quota ?
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 27/10/10 09:58, Scott Jackson wrote:
> 
> > Huh?
> 
> That was my response when I was investigating why we were
> seeing jobs banking up for people for no apparent reason.
> 
> > Gold should either succeed or fail for the entire
> > reservation request.
> 
> That's what we were hoping for. :-)
> 
> > It should not result in a "partial" reservation.
> 
> Just to clarify here's an (invented) example of what we
> believe we are seeing.
> 
> A project has 10,000 hours left and submits a job that is
> using 8,000 hours. They then submit a job that is going to
> use 3,000 hours. That job gets a reservation of 2,000 hours
> and defers in Moab.
> 
> > Please pass on the corroborating evidence of the
> > problem and I'll see if I can comment further.
> 
> A cursory glance shows I've got 1 user with 55 reservations
> for jobs that are currently blocked by Moab. All jobs are
> 48 CPU hours (1 core for 2 days) and Gold has given one job
> 32 hours and all the other reservations are for 0 hours.
> 
> Here's a quick list of those reservations.
> 
> [root at bruce-m ~]# showq -b | fgrep aooi | awk '{print $1}' | xargs -n1
> glsres --quiet -h -n
> 366619 367727 32.00 2010-10-27 10:03:17 2010-10-29 10:13:17 368075
> aooi
> VR0018 bruce-m 489
> 366620 367728 0.00 2010-10-27 10:03:18 2010-10-29 10:13:18 368076 aooi
> VR0018 bruce-m
> 366621 367729 0.00 2010-10-27 10:03:18 2010-10-29 10:13:18 368077 aooi
> VR0018 bruce-m
> 366622 367730 0.00 2010-10-27 10:03:49 2010-10-29 10:13:49 368078 aooi
> VR0018 bruce-m
> 366030 367731 0.00 2010-10-26 12:36:35 2010-10-28 12:46:35 367479 aooi
> VR0018 bruce-m
> 366031 367732 0.00 2010-10-26 12:36:36 2010-10-28 12:46:36 367480 aooi
> VR0018 bruce-m
> 366032 367733 0.00 2010-10-26 12:36:36 2010-10-28 12:46:36 367481 aooi
> VR0018 bruce-m
> 366033 367734 0.00 2010-10-26 12:36:36 2010-10-28 12:46:36 367482 aooi
> VR0018 bruce-m
> 366034 367735 0.00 2010-10-26 12:36:36 2010-10-28 12:46:36 367483 aooi
> VR0018 bruce-m
> 366035 367736 0.00 2010-10-26 12:37:07 2010-10-28 12:47:07 367484 aooi
> VR0018 bruce-m
> 366036 367737 0.00 2010-10-26 12:37:08 2010-10-28 12:47:08 367485 aooi
> VR0018 bruce-m
> 366037 367738 0.00 2010-10-26 12:37:08 2010-10-28 12:47:08 367486 aooi
> VR0018 bruce-m
> 366038 367739 0.00 2010-10-26 12:37:08 2010-10-28 12:47:08 367487 aooi
> VR0018 bruce-m
> 366039 367740 0.00 2010-10-26 12:37:08 2010-10-28 12:47:08 367488 aooi
> VR0018 bruce-m
> 366041 367741 0.00 2010-10-26 12:37:39 2010-10-28 12:47:39 367490 aooi
> VR0018 bruce-m
> 366042 367742 0.00 2010-10-26 12:37:40 2010-10-28 12:47:40 367491 aooi
> VR0018 bruce-m
> 366043 367743 0.00 2010-10-26 12:37:40 2010-10-28 12:47:40 367492 aooi
> VR0018 bruce-m
> 366044 367744 0.00 2010-10-26 12:37:40 2010-10-28 12:47:40 367493 aooi
> VR0018 bruce-m
> 366045 367745 0.00 2010-10-26 12:37:40 2010-10-28 12:47:40 367494 aooi
> VR0018 bruce-m
> 366046 367746 0.00 2010-10-26 12:38:11 2010-10-28 12:48:11 367495 aooi
> VR0018 bruce-m
> 366047 367747 0.00 2010-10-26 12:38:12 2010-10-28 12:48:12 367496 aooi
> VR0018 bruce-m
> 366048 367748 0.00 2010-10-26 12:38:12 2010-10-28 12:48:12 367497 aooi
> VR0018 bruce-m
> 366049 367749 0.00 2010-10-26 12:38:12 2010-10-28 12:48:12 367498 aooi
> VR0018 bruce-m
> 366050 367750 0.00 2010-10-26 12:38:12 2010-10-28 12:48:12 367499 aooi
> VR0018 bruce-m
> 366051 367751 0.00 2010-10-26 12:38:43 2010-10-28 12:48:43 367500 aooi
> VR0018 bruce-m
> 366052 367752 0.00 2010-10-26 12:38:44 2010-10-28 12:48:44 367501 aooi
> VR0018 bruce-m
> 366053 367753 0.00 2010-10-26 12:38:44 2010-10-28 12:48:44 367502 aooi
> VR0018 bruce-m
> 366054 367754 0.00 2010-10-26 12:38:44 2010-10-28 12:48:44 367503 aooi
> VR0018 bruce-m
> 366055 367755 0.00 2010-10-26 12:38:44 2010-10-28 12:48:44 367504 aooi
> VR0018 bruce-m
> 366056 367756 0.00 2010-10-26 12:39:15 2010-10-28 12:49:15 367505 aooi
> VR0018 bruce-m
> 366057 367757 0.00 2010-10-26 12:39:16 2010-10-28 12:49:16 367506 aooi
> VR0018 bruce-m
> 366058 367758 0.00 2010-10-26 12:39:16 2010-10-28 12:49:16 367507 aooi
> VR0018 bruce-m
> 366059 367759 0.00 2010-10-26 12:39:16 2010-10-28 12:49:16 367508 aooi
> VR0018 bruce-m
> 366060 367760 0.00 2010-10-26 12:39:16 2010-10-28 12:49:16 367509 aooi
> VR0018 bruce-m
> 366061 367761 0.00 2010-10-26 12:39:47 2010-10-28 12:49:47 367510 aooi
> VR0018 bruce-m
> 366062 367762 0.00 2010-10-26 12:39:48 2010-10-28 12:49:48 367511 aooi
> VR0018 bruce-m
> 366063 367763 0.00 2010-10-26 12:39:48 2010-10-28 12:49:48 367512 aooi
> VR0018 bruce-m
> 366064 367764 0.00 2010-10-26 12:39:48 2010-10-28 12:49:48 367513 aooi
> VR0018 bruce-m
> 366065 367765 0.00 2010-10-26 12:39:48 2010-10-28 12:49:48 367514 aooi
> VR0018 bruce-m
> 366066 367766 0.00 2010-10-26 12:40:19 2010-10-28 12:50:19 367515 aooi
> VR0018 bruce-m
> 366067 367767 0.00 2010-10-26 12:40:20 2010-10-28 12:50:20 367516 aooi
> VR0018 bruce-m
> 366068 367768 0.00 2010-10-26 12:40:20 2010-10-28 12:50:20 367517 aooi
> VR0018 bruce-m
> 366069 367769 0.00 2010-10-26 12:40:20 2010-10-28 12:50:20 367518 aooi
> VR0018 bruce-m
> 366070 367770 0.00 2010-10-26 12:40:20 2010-10-28 12:50:20 367519 aooi
> VR0018 bruce-m
> 366071 367771 0.00 2010-10-26 12:40:51 2010-10-28 12:50:51 367520 aooi
> VR0018 bruce-m
> 366072 367772 0.00 2010-10-26 12:40:52 2010-10-28 12:50:52 367521 aooi
> VR0018 bruce-m
> 366073 367773 0.00 2010-10-26 12:40:52 2010-10-28 12:50:52 367522 aooi
> VR0018 bruce-m
> 366074 367774 0.00 2010-10-26 12:40:52 2010-10-28 12:50:52 367523 aooi
> VR0018 bruce-m
> 366075 367775 0.00 2010-10-26 12:40:52 2010-10-28 12:50:52 367524 aooi
> VR0018 bruce-m
> 366076 367776 0.00 2010-10-26 12:41:23 2010-10-28 12:51:23 367525 aooi
> VR0018 bruce-m
> 366077 367777 0.00 2010-10-26 12:41:24 2010-10-28 12:51:24 367526 aooi
> VR0018 bruce-m
> 366078 367778 0.00 2010-10-26 12:41:24 2010-10-28 12:51:24 367527 aooi
> VR0018 bruce-m
> 366079 367779 0.00 2010-10-26 12:41:24 2010-10-28 12:51:24 367528 aooi
> VR0018 bruce-m
> 366080 367780 0.00 2010-10-26 12:41:24 2010-10-28 12:51:24 367529 aooi
> VR0018 bruce-m
> 366081 367781 0.00 2010-10-26 12:41:55 2010-10-28 12:51:55 367530 aooi
> VR0018 bruce-m
> 
> This is what gbalance says for this project:
> 
> [root at bruce-m ~]# gbalance -h -p VR0018 -m bruce-m
> Id Name Amount Reserved Balance CreditLimit Available
> - --- ----------------- ------- -------- ------- ----------- ---------
> 325 VR0018 0.00 0.00 0.00 0.00 0.00
> 350 VR0018 0.00 0.00 0.00 0.00 0.00
> 489 VR0018 on bruce-m 2848.80 2848.80 0.00 0.00 0.00
> 
> Is that useful ?
> 
> cheers,
> Chris
> - --
> Christopher Samuel - Senior Systems Administrator
> VLSCI - Victorian Life Sciences Computational Initiative
> Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
> http://www.vlsci.unimelb.edu.au/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkzHcjMACgkQO2KABBYQAh/TUgCfeBNVpwsSQYx8UpNCXwe8r3Mt
> EYcAmgKYxxnq5Au0j/oeF8F5AXL9ie/m
> =+9aR
> -----END PGP SIGNATURE-----
> _______________________________________________
> gold-users mailing list
> gold-users at supercluster.org
> http://www.supercluster.org/mailman/listinfo/gold-users


More information about the gold-users mailing list