EBU National Grading Scheme

9 Pages
←
1
2
3
4
→
Last »

You cannot start a new topic
You cannot reply to this topic

EBU National Grading Scheme How accurate is it likely to be?

#21 Vampyr

Group: Advanced Members
Posts: 10,611
Joined: 2009-September-15
Gender:Female
Location:London

Posted 2012-March-06, 00:28

phil_20686, on 2012-March-05, 19:31, said:

I think it will take a bit more time for the rankings of insular clubs to stabilise.

I think it will take forever. For clubs with few points of contact with the wider EBU, the occasional point of contact will cause wild fluctuations one way or another. Suppose a top pair pop into the a little club somewhere in Cumbria, whose members rarely, if ever, play outside that club. If the top pair have an off night and score poorly, this little club will be deemed a very strong field and the members' ratings will be inflated in perpetuity. Obviously the opposite effect is also possible.

I realise that this is an oversimplification, but I think that the effect is real and will be long-lasting. I wrote at length to John Carter and, after his death, to Sally Bugden about a number of concerns I had about the system. Other important (to me anyway) ones were rating individuals instead of partnerships, and the bell-shaped curve -- I think that the people rated "2" and "3" should be in larger groups so they don't feel stigmatised. Perhaps the ratings should even be concealed until a person reaches a certain level.

But mainly I worry that this scheme, were it to be taken seriously, would discourage casual and one-time partnerships, mentoring and the like, and would discourage people from showing up without a partner to clubs that have hosts or guaranteed partners. If/when team games begin to be included, people might not wish to team up with friends, in order to protect their ratings -- and for many, this will mean not participating at all. (This might be different if the Butler scores and not the team results were counted, of course.)

So in general, my thinking is that if people start to care about their ratings, I think it would have negative effects on bridge in England. I hope that I am wrong.

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones -- Albert Einstein

#22 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-March-06, 03:11

NGS Guide said:

Calculating event grading values for teams of four events

There are no plans to include within the NGS results from head-to-head teams-of-four matches, where boards are played at just two tables, as the scheme would be unable to differentiate the relative performances of the two partnerships within the team.

That's a strange argument. Exactly the same problem exists with differentiating the members of a partnership. In a pairs event they deal with this by adjusting your result by the difference between your grade and your partner's; in a teams event an equally good (or bad) solution is to adjust by the difference between your grade and your teammates' average grade.

Quote

It is intended, though, to include results from most other types of teams events as it becomes possible to analyse teams games as IMPed Pairs. This is because we are then able to reduce significantly the impact of the actions of our team mates from the calculation of our gradings.

Such analysis will become possible for Multiple Teams-of four events and Swiss Teams events provided that Butler IMP scores can be obtained from the scoring program. For Teams-of-eight matches, which are common in County Leagues, we would need to analyse the Butler IMP scores, and indeed, for a long time, Butler IMP scores have commonly been calculated for matches between large teams to assess relative performance by the various pairs within a team.

That addresses one of Vampyr's concerns, but I'm not terribly keen on it.

In a Swiss Teams event, Butlers (or cross-IMPs) simply won't produce the right answers, because the results are strongly dependent on the strength of the team you are playing.

In an event with seating rights, Butler or cross-IMP scores are also of questionable value, because the stronger pairs tend to pick each other. And it creates a perverse incentive to pick the weaker opposing pair so as to inflate your grade.

The only situations where I can see this working are round-robin events and leagues, where either you don't pick your opponents or you play against both opposing pairs.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#23 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-March-06, 03:31

gnasher, on 2012-March-05, 14:48, said:

Two matchpoint games at the YC in January appear on my record but didn't affect my grade.

Those two games were played with a visitor from overseas who had never played in England before. From the NGS Guide: "Only pairs where both players can be identified are included in the NGS processing of a session."

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#24 Cyberyeti

Group: Advanced Members
Posts: 14,908
Joined: 2009-July-13
Location:England

Posted 2012-March-06, 05:36

I'm not even on there as I rarely play pairs. My partner is in there at 62.10.

#25 helene_t

The Abbess

Group: Advanced Members
Posts: 17,394
Joined: 2004-April-22
Gender:Female
Location:Odense, Denmark
Interests:History, languages

Posted 2012-March-06, 05:47

haha I have the same ranking as Andy

How is this possible? I get on average some 50% on club nights

The world would be such a happy place, if only everyone played Acol :) --- TramTicket

#26 RMB1

Group: Advanced Members
Posts: 1,841
Joined: 2007-January-18
Gender:Male
Location:Exeter, UK
Interests:EBU/EBL TD
Bridge, Cinema, Theatre, Food,
[Walking - not so much]

Posted 2012-March-06, 05:56

Cyberyeti, on 2012-March-06, 05:36, said:

I'm not even on there as I rarely play pairs.

Ditto: I rarely play at all.

You can still find your own grade by logging in to the member area.

If anyone has looked, I am not from WOR (Worcestershire).

Robin

"Robin Barker is a mathematician. ... All highly skilled in their respective fields and clearly accomplished bridge players."

#27 Zelandakh

Group: Advanced Members
Posts: 10,764
Joined: 2006-May-18
Gender:Not Telling

Posted 2012-March-06, 06:19

I think we should all congratulate Frances and Jeffrey for being the third best pair in the whole of England! One wonders if these ratings might one day be used as a basis for international selections...

One thing I did notice flicking through this is that some top players are completely absent. Is there some reason for this? Are you allowed to opt out? To mgoetze: do you have a link to the German version, please?

(-: Zel :-)

#28 WellSpyder

Group: Advanced Members
Posts: 1,627
Joined: 2009-November-30
Location:Oxfordshire, England

Posted 2012-March-06, 06:53

Zelandakh, on 2012-March-06, 06:19, said:

One thing I did notice flicking through this is that some top players are completely absent. Is there some reason for this? Are you allowed to opt out?

Yes, you can opt out if you want to - a grade will still be calculated for you in order to provide data for other calculations, but the grade will not be published.

I suspect that a more likely reason for omissions, however, is that some top players simply aren't playing enough pairs games to earn a grade.

#29 daveharty

Group: Full Members
Posts: 694
Joined: 2010-October-21
Gender:Male
Location:Ann Arbor, MI
Interests:Bridge, juggling, disc sports, Jane Austen, writing, cosmology, and Mexican food

Posted 2012-March-06, 09:13

One thing that I found very interesting was that on the "Top Partnership" list, five of the top ten partnerships (and all of the top three) are mixed pairs. If this sort of list were generated in ACBL-land, I suspect this would not be the case. I know nothing about the EBU, could anyone explain why this is? Are there just a lot more mixed pairs events?

Revised Bridge Personality: 44 43 33 44

Dianne, I'm holding in my hand a small box of chocolate bunnies... --Agent Dale Cooper

#30 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-March-06, 09:50

daveharty, on 2012-March-06, 09:13, said:

Of that list of the top ten pairs by grade, only two pairs play in the EBU Premier League, which contains almost all of the best players in the country.

I think it's just that they don't have enough data for most of the top players playing in serious partnerships. Many top players play in their serious partnerships only in the major teams events; in pairs events they will often play with sponsors or friends, or stay at home. And most of them don't play any club bridge together.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#31 mgoetze

Group: Advanced Members
Posts: 4,942
Joined: 2005-January-28
Gender:Male
Location:Cologne, Germany
Interests:Sleeping, Eating

Posted 2012-March-06, 10:41

Zelandakh, on 2012-March-06, 06:19, said:

To mgoetze: do you have a link to the German version, please?

http://vu2109-rails....hoster.de/frame

"One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision"
-- Bertrand Russell

#32 FrancesHinden

Limit bidder

Group: Advanced Members
Posts: 8,482
Joined: 2004-November-02
Gender:Female
Location:England
Interests:Bridge, classical music, skiing... but I spend more time earning a living than doing any of those

Posted 2012-March-06, 11:43

daveharty, on 2012-March-06, 09:13, said:

No, there's just not enough data. For example, Jeffrey and I (one of your top three pairs) only have about 380 boards together counted (which is basically 2x Brighton Swiss Pairs + 2x National Pairs), which is a long way short of the 1000 that gives you a 'proper' grade. If you 'hide' evolving grades, so you only include pairs with 1000 boards of matchpoints, there is exactly one player who is a regular in the premier league (and he's playing with wife, not his usual partner).

Because you are looking mainly at club matchpoint results (+ some national pairs events) you will get a lot of couples rather than the country's top partnerships.

#33 gordontd

Group: Advanced Members
Posts: 4,485
Joined: 2009-July-14
Gender:Male
Location:London

Posted 2012-March-06, 12:49

gordontd, on 2012-March-05, 15:45, said:

It was a Swiss Pairs with IMP scoring, so I suspect that's what confused the scoring system. I did check the original XML file, and it does show your 103 VPs, so I've forwarded it to the EBU and deleted the session from the club's record. Hopefully it'll all get corrected soon.

I was correct that the unusual form of scoring wasn't recognised, but they are correcting it and we can expect to see Andy in his rightful place once they have.

Gordon Rainsford
London UK

#34 awm

Group: Advanced Members
Posts: 8,624
Joined: 2005-February-09
Gender:Male
Location:Zurich, Switzerland

Posted 2012-March-06, 13:29

This seems to suffer from three of the major problems that the "power rating" system has as well:

(1) Presumption of linearity in expected result. It's assumed that if you play in a field that is different from the norm, your expected percentage will change by the same amount regardless of what your starting percentage was. I don't think this is true. For example, suppose Frances has a rating of 68% in a national "average" field and shows up to a weak club game with a partner of comparable skill. The club game might be 10% below national average, and Frances and partner are very likely to win... but will they score 78%? Even in a weak field, it is very hard to consistently score at that level. Similarly, if you take two very weak pairs and put them in a national championship event, their final scores will be quite bad. But there will be much more luck as to who has the better score, than there would be for them in an event where the standard is weaker. My point is that when a pair is much better (or much worse) than the field their expected MP score should tail off. I don't think expected scores in excess of 75% are really reasonable or accurate regardless of the caliber of field (same could be said of scores below 25%).

(2) Presumption of linearity in partnership caliber. Carrying a very weak partner to a good result in a mediocre event is really a very different skill from getting a good result with a comparable partner in a top event. I know a lot of people who are much better at one of these skills than the other. It doesn't seem reasonable to presume that two strong players who obtain comparable results when playing with their regular partners will necessarily do comparably well when partnering a beginner in the pro-am, yet the rating system seems to presume precisely this.

(3) Partnership. Frances is likely to do a lot better playing with a regular partner of comparable caliber, than playing with a random pickup of comparable caliber (this is the same for everyone, I am just using her as an example). So her rating benefits from playing mostly with regular partners. There are ways to adjust for this (for example, weighting the effect on rating based on number of prior boards with this partner, so that boards in a "first time partnership" potentially count less than those in a long-term pair, which also has the nice effect of encouraging people to play in "one off" partnerships without worrying much about how it will effect their rating). Just looking at the discussion on this thread, while I don't know how to measure the relative caliber of Gnasher and Frances (who I'm sure are both fine players), it sounds like Gnasher plays a lot more "one off" partnerships than Frances does which would explain somewhat why his rating is so much lower.

Adam W. Meyerson
a.k.a. Appeal Without Merit

#35 dcrc2

Group: Full Members
Posts: 68
Joined: 2010-October-20

Posted 2012-March-06, 15:41

Vampyr, on 2012-March-06, 00:28, said:

I think you are wrong about the "wild fluctuations". Indeed there's an entire section of the document on "diffusion" which explains that the problem is pretty much the opposite: an isolated club with occasional visitors will have an average rating very close to 50%, and it will take a long time for this to shift. I've just worked through the maths myself and assuming I understand the document correctly -

Say that the visitors do about 10% worse than expected, and that the club has 5 tables; then each of the regular club members' session grades is about 1% higher than normal. Then the following week, assuming that the same regulars turn out at the club, they will find everyone's rating has increased by just under 0.05% (the most recent session is weighted by about 1 in 20), and the average session score that week will therefore be 0.05% more than before the visitors arrived. The week after that, the average rating has actually increased a little more (the weighting of the week the visitors played has decreased, but this is offset by the small increase in strength last week), but only very slightly. Following the calculations through, I'm finding that in the long run the average rating converges to about 0.07% more than before the visitors arrived.

So, yes, the regular members' ratings have increased permanently, but only by a tiny amount. Indeed, the change in average rating will be (almost) proportional to the total number of matchpoints above average won by all visitors to the club. The more visitors you have, the more the club's rating changes. A single visiting pair has almost no effect.

#36 Vampyr

Group: Advanced Members
Posts: 10,611
Joined: 2009-September-15
Gender:Female
Location:London

Posted 2012-March-07, 15:48

gnasher, on 2012-March-06, 03:11, said:

In a Swiss Teams event, Butlers (or cross-IMPs) simply won't produce the right answers, because the results are strongly dependent on the strength of the team you are playing.

I suppose that in a Swiss Teams, they would note who your opponents are and then cross-IMP the results. This would presumable be possible where Bridgemate II was available; I don't know how it would be managed if there was only Bridgemate I. For privately-played matches, I doubt anyone would go to the trouble of sending in the necessary information.

Quote

In an event with seating rights, Butler or cross-IMP scores are also of questionable value, because the stronger pairs tend to pick each other. And it creates a perverse incentive to pick the weaker opposing pair so as to inflate your grade.

Do you think people would be that bothered, though? And in fact, they might prefer to deflate their grade, because...

Zelandakh, on 2012-March-06, 06:19, said:

One wonders if these ratings might one day be used as a basis for international selections...

My understanding is that these ratings are primarily for use in handicapping.

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones -- Albert Einstein

#37 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-March-07, 16:57

Vampyr, on 2012-March-07, 15:48, said:

Do you think people would be that bothered, though?

Probably not, because the only events which have both duplicated boards and seating rights are things like the Spring Fours, the Crockfords final, and the final of the Brighton teams, where there are strong incentives to do well as a team.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#38 FrancesHinden

Limit bidder

Group: Advanced Members
Posts: 8,482
Joined: 2004-November-02
Gender:Female
Location:England
Interests:Bridge, classical music, skiing... but I spend more time earning a living than doing any of those

Posted 2012-March-08, 02:04

gnasher, on 2012-March-06, 03:11, said:

In a Swiss Teams event, Butlers (or cross-IMPs) simply won't produce the right answers, because the results are strongly dependent on the strength of the team you are playing.

Isn't this true of Swiss Pairs events as well? The NGS data already includes matchpointed swiss pairs (e.g. about 1/7 of my boards are from the Brighton swiss pairs when, in theory, we were playing the strongest pairs in the field for most of the event).

#39 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-March-08, 02:24

FrancesHinden, on 2012-March-08, 02:04, said:

I was wrong about this. For Swiss pairs events they adjust for the strength of the people you play:

NGS Guide said:

Things are different for a Swiss Pairs Movement. Here we take each match as a separate stanza within the event as a whole. For each match, we are playing against a single pair of opponents. Here, as in Bridge Club Live, it is by far best to take the grading values of just your two opponents. The SOpp factor for one match is just the average current grade of your two opponents.
Your SOpp for the event as a whole is the average current grade of your opponents in each of your matches.

("SOpp" = "Strength of opponents")

As Stefanie says, they could do the same for Swiss Teams. They'd have to know which half of the opposing team you were playing.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#40 hatchett

Group: Full Members
Posts: 589
Joined: 2005-November-02
Location:Moldova

Posted 2012-March-10, 09:09

I agree with much of AWMs post. They are many flaws in this rating system. It seems to me a very good player playing
with a very weak player may have an advantage. Imagine a 70% player playing with a 30% player in a 50% field. The 70% player hogs the bidding
so he gets to play more than his fair share of hands which when he plays will often lead to a good score (I know this example is oversimplified
because there might be issues bidding the right contract but the point is there). If he plays with another 70% player with the same declarership skill, this tactic is not available. Effectively the partnership is playing as a 52/53% partnership but is rated 50% by the system.

9 Pages
←
1
2
3
4
→
Last »

You cannot start a new topic
You cannot reply to this topic

BBO Discussion Forums: EBU National Grading Scheme - BBO Discussion Forums

EBU National Grading Scheme How accurate is it likely to be?

#21 Vampyr

#22 gnasher

#23 gnasher

#24 Cyberyeti

#25 helene_t

#26 RMB1

#27 Zelandakh

#28 WellSpyder

#29 daveharty

#30 gnasher

#31 mgoetze

#32 FrancesHinden

#33 gordontd

#34 awm

#35 dcrc2

#36 Vampyr

#37 gnasher

#38 FrancesHinden

#39 gnasher

#40 hatchett

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users

Delete Post

Skin and Language

Execution Stats

BBO Discussion Forums: EBU National Grading Scheme - BBO Discussion Forums

EBU National Grading Scheme How accurate is it likely to be?

1 User(s) are reading this topic 0 members, 1 guests, 0 anonymous users

Delete Post

Skin and Language

Execution Stats

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users