BBO Discussion Forums: System performance metrics - BBO Discussion Forums

Jump to content

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

System performance metrics

#21 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-04, 02:44

View Postawm, on 2025-March-03, 16:13, said:

I feel like there's something weird about the way you're evaluating this, such that opening lighter is virtually always better (to the degree that a one point difference in 1NT opening massively moves the needle and EHAA looks like by far the best of the systems).

I think it is OK that EHAA works well on these metrics. It does get a lot of information across in the first bid, and if playing with a partner who always makes insufficient bids and bars me from the rest of the auction, I think I would rather play EHAA than a more reasonable system.

EHAA may be a bad system overall because the 2-openings leave too little space to sort out anything, but the metrics I have looked at so far don't address this (other than the aggresiveness in the scatter plot).

On the other hand, that it made such a big difference to take the balanced 11-counts out of the IMPrecision 1 opening is indeed weird. I think part of it is down to walrus mentality of my scoring system, but I will try to look a bit deeper into it.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#22 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,461
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted 2025-March-04, 05:43

Part of the problem is the emphasis on “know you have game.”

If opponents preempt and I have say 14 points, it’s really helpful to me if partner opened because now I know we have game values, whereas if partner passed it’s tougher.

But if I have say 19, I expect a game even opposite a pass. If partner opened it is not so much help to me (actually it might make me think we have slam, so a marginal opening by partner could actually hurt here).

I think you are giving credit for opening the balanced 11 any time partner has 14+, whereas I think it really only helps when partner is in the borderline range of like 14–16. Against that, you are behind when partner has 12-13 because partner could’ve made a better decision if you had a higher minimum.
Adam W. Meyerson
a.k.a. Appeal Without Merit
1

#23 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted 2025-March-04, 06:17

I think a better metric is 'uncertainty that we have game', i.e. the entropy of the binary yes/no question, conditional on our hand, partner's first call and the preempt.
1

#24 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-05, 13:09

Something I find perplexing, when looking at entropy of the opening bid versus what we can call "information effectiveness", i.e. entropy divided by the maximum entropy that is possible given the system's level of aggressiveness (average opening bid height):
Posted Image
There is a clear negative correlation. In other words, the more aggressive systems such as EHAA do transmit more information with the opening bid but not as much as they "ought" to do. It doesn't have to be that way - one could in principle design an aggressive system that also transmitted lots of information with the opening bid, but it would have to be something different from the kind of systems I have explored. Maybe something like Todd & Atul's Dejeuner system.

We see that Moscito does strike a good trade-off here which is maybe not so surprising given that Marston had a bit of the same obsession as I have, namely to design a system that is "optimal" in a very theoretical sense. But funny also to see Norwegian standard doing well here.

It is related to another scatterplot, namely entropy versus the probability that the opening bid already takes us beyond the safety level from responder's point of view:
Posted Image
Norwegian Standard and Cottontail Club do well here, both 4-card major systems with strong NT.

This is of course still quite crude. I would like to develop something closer to the Useful Space Principle, i.e. that a weak 2 opening often leaves no space below the safety level is not so bad, responder can usually just pass. That a Polish 2 opening often doesn't leave any space below the safety level is a bigger problem.

Also, with respect to Adam's comment about bidding game with 19 points: maybe, instead of defining safety as 100% safety, I could define it as e.g., 95% safety in uncontested auctions and 75% safety after an enemy preempt. This also has the advantage that the 25 percentile is statistically more stable than the minimum, so I would need fewer sims.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#25 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted 2025-March-05, 13:22

Maximum entropy up to a given level is attained with a uniform distribution. I think that is why EHAA scores high - the entropy should be strongly correlated with using high opening bids relatively often, even conditional on the mean opening level.

If you guaranteed had the auction to yourself, you wouldn't go to far wrong by maximising information density (though there are tradeoffs with safety levels and information leakage - so this is by no means a safe assumption!). Traditional thinking has it that natural systems under this condition want to approximately reduce the frequency of each subsequent call by 50% compared to the one before that, e.g. half of all hands pass, a quarter opens 1, an eight open 1 etc., while relay systems have a theoretical (less information dense) limit of a factor 1.618.

Conversely, if I have it in enforcable legal writing that my LHO is about to bid 3 over my opening, regardless of what I do, I maximise the information shared by picking a uniform frequency distribution from pass to 2NT inclusive (and some smidgen assigned to 3 and up).

Put differently, if the opponents don't jump the auction, we have more space after cheaper bids, so we want more hands in it (to entangle later).

In practice, not only are the frequenty arguments too simple to be of much use for system design, also the lack of knowledge on which type of auction we are about to enter suggests something between these extremes. I am not convinced that entropy of the opening distribution conditional on the mean opening measures much other than level of aggression. Instead the uncertainty in partner's decisions conditional on our information is probably of more interest.

View Posthelene_t, on 2025-March-05, 13:09, said:

Also, with respect to Adam's comment about bidding game with 19 points: maybe, instead of defining safety as 100% safety, I could define it as e.g., 95% safety in uncontested auctions and 75% safety after an enemy preempt. This also has the advantage that the 25 percentile is statistically more stable than the minimum, so I would need fewer sims.
This is where entropy is useful. Responder can look at their hand, at the preempt, at the opening, and reason something like "there is a 85% chance that we have game" (e.g. based on a double dummy simulation, or on a less expensive evaluation metric such as "either we have 25 HCP, or a major suit fit and something that re-evaluates to 25 HCP"). The entropy of that yes/no question is a good quantitative criterion of the difficulty of deciding whether or not to go to game.
1

#26 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-05, 14:48

View PostDavidKok, on 2025-March-05, 13:22, said:

Conversely, if I have it in enforcable legal writing that my LHO is about to bid 3 over my opening, regardless of what I do, I maximise the information shared by picking a uniform frequency distribution from pass to 2NT inclusive (and some smidgen assigned to 3 and up).

Yes, EHAA is still not quite that extreme but indeed, this very high entropy relates to its good performance when opps preempt.

The relatively poor performance of the more aggressive systems in these two scatterplots is, I thought, partly related to too low frequency of the 1-of-a-suit openings relative to pass. But the numbers don't support this idea. EHAA passes 33% of hands while the geometric distribution corresponding to its aggressiveness would pass 23% of hands. The numbers for Cottontail 50% versus 36% (Cottontail is the only system in my basket that opens all balanced 11-counts and also has a weak two in diamonds so it passes slightly less than other normal systems). So the ratios are about the same.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#27 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted 2025-March-05, 15:39

View Posthelene_t, on 2025-March-05, 14:48, said:

(Cottontail is the only system in my basket that opens all balanced 11-counts and also has a weak two in diamonds so it passes slightly less than other normal systems)
Some systems with a Kamikaze or Chicken (i.e. variable) notrump also have these properties. Notably Dutch Doubleton or Swedish/Polish club with a 10-13 or 9-12 1NT opening, even if it depends on vulnerability. Ironically these systems regularly don't have room to open a bunch of unbalanced 11-counts, even though they regularly open balanced 10- or even 9-counts. Though I think your basket of systems already contains a lot of options, I just wanted to mention it as a curiosity.
0

#28 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-07, 12:53

Entropy of game vs not game depends mostly on the HCP resolution, so it may be more interesting to show it against the entropy of the trinary spadefit/heartfit/nomajorfit variable (double fits I have attributed to the more likely fit from responder's POV):
Posted Image
Tarzan passes 11- and 12-counts if it has a doubleton diamonds but I have also included a 12-point version of it. IMPrecision and Cottontail are 11-point balanced minimum, but I have also a 12-point IMPrecision version. Berkowitz is 12-point balanced minimum and Wei generally 13, but 11 with a 4-card diamonds. Generally 12-point is the minimum for balanced hands for natural systems (previously I by mistake made it 11 for Norwegian, which explains why it performed better than other natural systems on some metrics).

Anyway, we see that the game entropy is particularly poor for Vienna as it doesn't have a natural 1NT opening. Cottontail Club is a Canape system which probably isn't that great for the major suit entropy.

We can also break the major fit entropy down by opening bid for each system. The overall major fit entropy for the systems is weighted averages (weighted by frequency of the opening bids) of the opening-specific entropies. Here we see that Cottontail has very low entropy for the 2 opening as that opening denies 4+ in a major suit, but otherwise it has high major fit entropy.


Posted Image
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#29 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-07, 14:49

It is a bit strange that the game entropy is so poor for Cottontail Club. Part of it could be a spin-off from the major fit entropy (for game purposes I require 25 points without a major suit fit, 24 with an 8-card fit and 22 with a 9-card fit), but on the other hand, several of the other strong club systems don't have a weak two in diamonds.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#30 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted Yesterday, 06:38

For my understanding, how do you compute the major suit fit variable? I'd have expected e.g. the Cottontail 1 opening to score well, as it denies exactly a 4cM and is only infrequently a 5(+)cM, especially conditional on responder having major suit length. Similarly the 1 opening denying a 4c suit is an upside in my mind for identifying major suit fits - but based on the graph, it does worse than the 1 opening (which does not deny a 4c suit, and is otherwise symmetrical).
0

#31 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Yesterday, 09:13

View PostDavidKok, on 2025-March-08, 06:38, said:

For my understanding, how do you compute the major suit fit variable? I'd have expected e.g. the Cottontail 1 opening to score well, as it denies exactly a 4cM and is only infrequently a 5(+)cM, especially conditional on responder having major suit length. Similarly the 1 opening denying a 4c suit is an upside in my mind for identifying major suit fits - but based on the graph, it does worse than the 1 opening (which does not deny a 4c suit, and is otherwise symmetrical).

Major suit fit, from responder's point of view, is the percentage of opener hands that would give an 8+ card fit in hearts or spades, respectively. Double fits are allocated to the most likely of the two fits.

It's a good point that the probability of a fit decreases when one conditions on responder's hand. I didn't do this, I just used the overall length distribution for the given opening bid. Same with points. This will give a negative bias of the entropy in some situations such as when the opening bid is a neboluous minor suit opening and responder has a six-card major. I didn't think that it would give different biases for different systems but probably he bias is positive for Cottontail's 1 opening.

I suppose I could sample conditionally on responder's hand but I would need a better computer for that. What I could do is to calculate the entropies the correct way specifically for cottontail 1 and e.g., Tarzan 1 to see if there is substantial differences in the biases.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#32 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted Yesterday, 09:30

I understand. This is not the entropy I was thinking of. My thoughts were:

  • Take an opening system.
  • Generate N pairs of hands (so that automatically the hand held by responder is conditioned on being in the same deal as opener). For each of them, determine the opening call.
  • From the perspective of responder, for each of the N hands responder can see, generate M hands compatible with the opening conditional on responder's hand. Figure out what fraction of them have a major suit fit (or game, or any other selection criterion we might wish to investigate). Calculate the entropy of that question, conditional on the opening call and responder's hand.
  • Average this across all N generated hand pairs, to get the system value.

If we want, we could also condition on some overcall or jump overcall in between. But for now I think that would be excessive. Not conditioning on responder's hand will always favour systems that have longer major suit openings, I think. Without it we lose a lot of inferences in precisely the situations where responder is not sure yet.
The scheme above should, I think, not require extensive re-coding. Generating hands compatible with the opening call given responder's hand is hopefully similar to defining the openings in the first place - perhaps it is possible to simply generate random hands compatible with responder's hand (the dealer script has a 'predeal' keyword that does just this) and then we discard all that are not the requisite opening. My thinking is this does not need to be computationally expensive.
1

#33 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Yesterday, 11:24

Yes, this is what I have done (you explain it clearer than I do, thanks :) ) except that I did not sample opener's hand conditional on responder's hand.

I agree that it doesn't require a lot of recoding. The problem is that I have to generate a separate sampling frame for each responder hand, so if there are on average 1000 responder hands for each of the openings, I need to apply the opening filter to 1000 times as many hands as I do now.

It is not impossible, I think with a bit better computer and a bit better planning of my workflow I could do it even without better coding. And there are a lot of things I could do more efficiently, for example I don't need to deal and evaluate all 52 cards when for most analysis it is only opener's and responder's HCP and distribution that matters. Maybe we can find time to talk about it when I am in the Netherlands after next week :)

What I have found from this tour de force is that it is surprisingly easy to implement a bunch of bidding systems, alas as long as I only implement the opening bid. Defining meaningful metrics is more challenging, also the QA of the code and managing memory and CPU resources is more difficult.

But maybe some of the simple well-known metrics such as frequency of immediate identification of strain and/or level, aggressiveness and entropy of the opening bid itself is of some interest. These things are sometimes discussed and then it would be nice to have evidence to back it up. One thing I found really annoying here is that most of the metrics are very sensitive to the way balanced 11-counts are handled. This is maybe of some interest in itself but it is as if the rest of the trends are obscured by this weak-balanced-hands issue.

I wonder what (if any) form of this would be of wider interest. What about a web app where people can specify their own system and then get various metrics for it, compared to a bunch of well-known systems?
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
1

#34 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted Yesterday, 11:33

It might be a feature, that so much is drowned out by the decision of what to do with balanced 11-counts. By frequency, that has a huge impact on system design. Not coincidentally, opening 11's is approximately the current frontier of aggressive bidding systems.

On a separate note, I expected the entropy for a yes/no question to be limited above by -0.5*ln(0.5) + -0.5*ln(0.5) = 0.693 (and analogously, a trinary decision having an entropy of at most ln(3) = 1.10). Are you using the 2-log, or scaling the results?
1

#35 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,461
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted Yesterday, 11:48

So I have to admit that my vague recollection of college physics is that entropy is a rather complex thermodynamic principle (and a quick check on Wikipedia seems to confirm this). I’m not really sure how to interpret results expressed in that form, nor does it really seem connected to the game we are trying to play.

I’d much prefer something expressed in terms of realistic auctions where we could see sample hands.
Adam W. Meyerson
a.k.a. Appeal Without Merit
0

#36 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,715
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted Yesterday, 11:53

The entropy of a probability distribution is a measure of its failure to be a single point value - in particular, the amount of information that would be required to know it completely, instead of requiring the use of probability theory. It is applied in thermodynamics, and later also other fields of physics, to capture the loss of information in forwards (and backwards! But that's a bit nuanced) time by using a simple representation of a multi-component system. For contrast, I like to consider the properties of entropy in physics next to Liouville's theorem on conservation of phase space.

However, here it is to be interpreted as 'how many more questions would you need to ask, on average, before you know the answer that you are looking for?'. For example, the entropy of the question 'do we have an 8(+)-card heart fit', tells us how regularly a system will solve this problem for us right off the bat, or in the cases where it doesn't, how much uncertainty remains.

You can use other statistical measures of information density and information transmission, but entropy is particularly useful. See, for example, https://en.wikipedia...rmation_theory).
0

#37 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Yesterday, 12:23

View Postawm, on 2025-March-08, 11:48, said:

I’d much prefer something expressed in terms of realistic auctions where we could see sample hands.

I would need to implement more than just the opening bid, then. I suppose I could do that. Maybe all uncontested auctions until either a gf or a sign-off is reached, and then see how good the systems are in stopping below 2nt, avoiding 6-card fits or worse, and finding major suit fits?
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#38 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Yesterday, 12:24

View PostDavidKok, on 2025-March-08, 11:33, said:

Are you using the 2-log, or scaling the results?

Yes, log2.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#39 User is online   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,461
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted Yesterday, 14:34

 DavidKok, on 2025-March-08, 11:53, said:

The entropy of a probability distribution is a measure of its failure to be a single point value - in particular, the amount of information that would be required to know it completely, instead of requiring the use of probability theory. It is applied in thermodynamics, and later also other fields of physics, to capture the loss of information in forwards (and backwards! But that's a bit nuanced) time by using a simple representation of a multi-component system. For contrast, I like to consider the properties of entropy in physics next to Liouville's theorem on conservation of phase space.

However, here it is to be interpreted as 'how many more questions would you need to ask, on average, before you know the answer that you are looking for?'. For example, the entropy of the question 'do we have an 8(+)-card heart fit', tells us how regularly a system will solve this problem for us right off the bat, or in the cases where it doesn't, how much uncertainty remains.

You can use other statistical measures of information density and information transmission, but entropy is particularly useful. See, for example, https://en.wikipedia...rmation_theory).


This is not particularly useful. Telling me that, on average, I will need to “ask 0.4 more questions” to find out if we have a major suit fit does not seem meaningful. What are “questions” in a bridge context? I’d like to know how often I will be able to make good decisions, rather than given some information theoretic quantity that I can’t translate about “how much more information I need.”
Adam W. Meyerson
a.k.a. Appeal Without Merit
0

#40 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,270
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted Today, 00:07

I tried to make a few analyses of responder"s ability to make immediate decisions but obviously this rewards hyperaggresive systems. Then I thought I could calculate probabilities of getting certain decisions right if you are allowed to take the push if e.g. game has 75% chance. But that has all kind of limitations, e.g. a semipositive opposite a 1nt opening is very specific while a semipositive opposite an SA 1m opening still leaves room for opener to guess wrong.

Maybe , to make the simple metrics more relevant, i could calculate crossimps for a manageable subproblem and then see if there is some combination of metrics that predict performance. But this will be huge work and probably just lead to the conclusion that the best system is Moscito because it is simple enough for me to implement without bugs. And a system like Acol whixh works heavily on texture will not do well in a testing environment that only classifies hands by points and shape.

And I am not even talking about opps bidding.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

14 User(s) are reading this topic
0 members, 14 guests, 0 anonymous users