Ratings and Probabilities

Pymander · July 6, 2012

This topic in applied mathematics concerns the rating of chess players so that rating differences will imply odds, convergent with increasing data a the mathematically deduced best rate possible without causing anomalies. Applied to Swiss Tournaments, computer simulations verify perfect ranking and unprecedented convergence for odds calculation. The calculation of odds is a unique feature, and allows objective analysis of simulation results. The formulae can be applied to any endevour between pairs of competitors, where success may be guaged as two parts of the whole. In chess o, 1/2, or 1 are the only results used.

studiot · July 6, 2012

unprecedented convergence for odds calculation.

If it can be applied to the football pools, dogs or gee gees you will have many gullible takers.

Bignose · July 7, 2012

computer simulations verify perfect ranking

I very much doubt this. Even in chess, there is some luck. Luck of your opponent having a cold. And even the GMs make blunders once in a great while. Nothing can be truly perfectly predictive.

**ecoli** · July 7, 2012

you can build that kind of uncertainty into a predictive/probabilistic model though, which should properly reflect the odds, point spread, w/e

Bignose · July 7, 2012

Maybe it is just a semantics thing. The word 'perfect' to me meant that it will make perfect predictions of the outcomes, which again I contend is impossible. Really, in this case we need the word perfect to be defined. Actually, there are a lot of things the OP needs to define. And actually write about. I am not really sure what the point in starting the thread was, considering so little was actually written in the first post.

Pymander · July 7, 2012

This topic in applied mathematics concerns the rating of chess players so that rating differences will imply odds, convergent with increasing data a the mathematically deduced best rate possible without causing anomalies. Applied to Swiss Tournaments, computer simulations verify perfect ranking and unprecedented convergence for odds calculation. The calculation of odds is a unique feature, and allows objective analysis of simulation results. The formulae can be applied to any endevour between pairs of competitors, where success may be guaged as two parts of the whole. In chess o, 1/2, or 1 are the only results used.

Probabilistic Rating Theory

(Odds Ratings)

A = ½ ( Ao + Bo + R )

B = ½ ( Ao + Bo – R )

P = 1 / (1 + e ^ ( –KR ) )

R = (log ( (e ^ ( KRo( Ro> 0 ) ) + S ) / ( e ^ ( –KRo( Ro< 0 ) ) + 1 – S ) ) ) / K

K = –( log ( 1 / Ps – 1 ) ) / Rs

Table of Contents

1. Introduction

2. Point Sequences

3. The Rating Function

4. Concatenation of Probabilities

5. Concatenation Laws

6. Practical Ratings

7. Fundamental Theorem of Games of Skill

8. Individual Games

9. Probability Transformations

10. Rating Changes

11. Rating Tolerance

12. Rating Change Formulae

13. Formulae Summary

14. Playing Strength Distribution

15. Non-chronological Processing

1. Introduction

Chess is one of many games of skill at which the novice finds the accomplished master invincible. This theory is designed to ascribe a RATING to the playing strength manifested by sequences of game results. ODDS RATINGS are numbers of mathematical significance. The ratings of two members of the chess playing fraternity allow the calculation of winning odds. Ratings (and odds) will change with each game result unless equal players draw. It is a moot point whether actual playing strength (and associated rating) is effectively constant. Only playing strength, as manifested in game results, can be measured. We refer to calculated ratings, which oscillate between fixed limits (TOLERANCE). Within tolerance a player will perceive his recent PERFORMANCE, but beyond that a change in playing strength is indicated. Performance variations are a result of POINT SEQUENCES associated with playing strength, and the interaction of multiple point sequences when playing several opponents, as in tournaments.

This theory is applicable to any game (situation) where the fraction of absolute success is measureable. The probability associated with ratings can be interpreted as the expected fraction of success per game. It provides the most current and accurate measure that can be calculated from the available data. A few generations of programmable calculators have permitted the evolution of both text book theory and program code (a Swiss Tournament Manager). The STM doubles as a simulator, which has verified the efficacy of the system. Randomized results generated according to implied odds have been used to reconstruct assumed ratings. Accuracy is achieved as predicted by theoretical tolerance expectations.

The system can be parameterized. For ease of mental odds estimation, as well as a comparable rating range, these seem best: 0 to 3000 should cover all assumed 10^9 players, considered Normal with mean 1500. The following table of odds against rating differences implies a tolerance of plus or minus 50 points. Rapid convergence allows mean = provisional rating. Zero sum rating changes maintain the mean and prevent rating inflation. As such, Fischer at 2760 would win 6200 games per loss (or 3100 per draw) against the average player.

TABLE 1

Wins per Loss Rating Difference

1 0

2 100

4 200

8 300

16 400

32 500

64 600

128 700

256 800

512 900

1024 1000

This table (or its implied function) forms one statement of the fundamental hypothesis upon which Odds Ratings rests, allowing the deduction of all (seven) necessary formulae and (five) protocols for non-chronological processing.

Formulae Summary

Ratings

Ao and Bo will denote the old ratings of two players, A and B the new ratings after player A scores S in { 0, ½, 1 }, the result of one game, against player B.

Rating Difference

Ro will denote the old rating difference

Ro = Ao – Bo

and R will denote the new rating difference

R = A – B

after player A scores S in { 0, ½, 1 }, the result of one game, against player B.

System Constant

If Rs is the rating difference used to denote Qs wins per loss, then the system constant K will be

K = ( log Qs ) / Rs

Qs = 2 and Rs = 100 is suggested as the best parameterization (see TABLE 1).

Probability

Given a rating difference R, the expected wins per loss Q is given by

Q = e ^ ( KR )

or probability P by

P = Q / ( Q + 1 ) = 1 / (1 + e ^ ( –KR ) ) = ½ + ½ tanh ( ½ KR )

Rating Difference Transformation

An old rating difference Ro and a game score S in {0, ½, 1} will produce a new rating difference R given by

R = (log ( (e ^ ( KRo( Ro> 0 ) ) + S ) / ( e ^ ( –KRo( Ro< 0 ) ) + 1 – S ) ) ) / K

where true = 1 and false = 0

Rating Change

A and B are the new ratings after a rating difference change from Ro to R

A = ½ ( Ao + Bo + R )

B = ½ ( Ao + Bo – R )

Note: These formulae are sufficient to process chronological rating changes (as for instance while managing a Swiss tournament).

Non-Chronological Processing

Non-chronological processing requires special protocols. With Odds Ratings a chronology must be assumed to allow processing as normal. These protocols optimize the assumptions and prevent anomalous ones, as explained below. They effectively average the performance of each player. In fact, non-chronological processing is the only average of any kind produced by Odds Ratings.

The protocols follow:

1. Maximize Draws – individual pairs of players are assumed to have drawn the maximum number of games achieving their overall score. Draw tolerances are smaller than win tolerances, and the corresponding point sequences converge more rapidly (see below for formula).

2. Score Based Point Sequences – by normalizing the point sequences so that the wins per loss ratio remains as constant throughout as possible, the accumulation of losses or wins at one end of the sequence does not occur (by design or accident). Such would produce the effect of an increasing or decreasing rating throughout the games. The i’th game result Si in a normal point sequence, maximizing draws, is given by

Si = ½ ( int ( ½ + 2iPab ) – int ( ½ + 2 ( i – 1 ) Pab ) )

where:

int X is the largest whole number less than or equal to X,

Pab = Sab / Gab,

Sab is the overall score,

and Gab is the total number of games,

by player A against player B.

3. Completed Rounds – if the maximum games by any individual pair of players is Gm, then processing must be executed as Gm rounds, where no pair of players play more than one game per round. Furthermore, pairs with one game play in the last round, those with 2 games in the last 2 rounds, those with Gm - 1 in all but the first. That way the players with fewer games are processed against more accurate ratings.

4. Hierarchical Processing – before each round, players are sorted according to their progressively recalculated ratings. They are then processed from the lowest to the highest, each against the lowest to the highest remaining. First in best dressed effects are thus avoided, the weaker players getting first bite of the cherry, rather than the field being plundered of points by the strongest first.

5. Two Pass Processing – by using two passes to reprocess a completed Swiss tournament, the method most effectively rates an entire field of unrated players (set to the system average = provisional rating). However, a small number of unrated players among several rated players need no special treatment. This same method also provides the perfect tie-break, finely grading relative play on the basis of a single event.

Conclusion

The information here is sufficient to implement Odds Ratings either on a site or worldwide. The theoretical proof or derivation from the fundamental hypothesis is available, if involved. This hypothesis was, in fact, that the relationship between probabilities P and rating differences R is, in essence

P = ½ + ½ tanh R

which, when graphed, will be seen to be intuitively obvious. But the shape came to me in 1974, and I’m ashamed to say that the exact function came to me only ten years ago, though others had been tried. The correct function precipitated some surprisingly simplified mathematics and vastly superior results. That the exponential nature didn’t occur to me sooner I can only blame on the fact that I never did my homework. I should have noticed the similarity to differential equations for capstans. But here is the reason for the invincibility of masters of the game, and the many grades in any form of mastery. TABLE 1 is then a relatively trivial mathematical implication of this (with added parameterization for the more familiar scaling).

Point sequences form an integral and necessary part of the theory of Odds Ratings. For the chess player who is also a mathematician (such as Arpad Elo, Max Euwe and Jose Capablanca) this theory does a lot to provide him with more realistic expectations of his abilities. The interaction of these sequences during tournaments also goes far to demonstrate possible outcomes that can be expected. Point sequences are idealized ordered sets of game results. They are not very far from the reality, certainly not as far as Professor Arpad Elo imagined. “The measurement of the rating of an individual might be compared with the measurement of the position of a cork bobbing up and down on the surface of agitated water with a yard stick tied to a rope swaying in the wind.” This is an illusion, not a fact. Point sequences don’t stray so far as the bobbing cork, and the linear functions used are the yard stick, and way off. But Elo Ratings and its offspring are not based on probability, but on statistics. Averages take much data even when nothing is changing overall. What is more changeable than man?

Arpad Elo was commissioned by the USCF in 1959 to replace dysfunctional systems, and provided it on request in one year. FIDE struggled on 10 years more without Elo Ratings. Presumably they were adopted in desperation because nothing better was forthcoming. Remember that Einstein didn’t produce Relativity on demand in one year, rather in ten, and driven by curiosity. He said “It is a wonder that the tender plant of a young child’s curiosity isn’t entirely strangled by modern methods of instruction.” Well, that’s the best excuse I can come up with for not doing my homework. But Arpad did the best he could in the time he had. The inductive leap often required is usually not available on demand. Like the tender plant it must be nurtured and loved. And beginning with an incorrect hypothesis can leave you flogging a dead horse, just like placing faith in a flawed chess opening system.

Pymander · July 9, 2012

Maybe it is just a semantics thing. The word 'perfect' to me meant that it will make perfect predictions of the outcomes, which again I contend is impossible. Really, in this case we need the word perfect to be defined. Actually, there are a lot of things the OP needs to define. And actually write about. I am not really sure what the point in starting the thread was, considering so little was actually written in the first post.

A definite integral produces a perfect result. Simpson's rule may approximate by considering a tenth degree polynomial passing through 10 points on a curve, but will not be perfect except for equal or lesser degree polynomials. If we cannot solve the indefinite integral, or use numbers precise enough for higher degree polynomials, we must use approximate methods, and results will not be perfect.

Odds Ratings rely, like all mathematics, on defining axioms. Odds Ratings needs only one. Average wins per loss are directly proportional to the natural exponent of the rating difference (at any time). The rest is deduction. Elo Ratings are approximate, with severe limitation, ad hoc functions with replaceable constants depending on rating differences, and inflation, like a bad economy run by lunatics. Rating changes are forced to be very small to prevent severe anomalies. This can be very demoralising to new players, and advantages frequent and seasoned players (same analogy).

Here is the next chapter of the theory, to give a clue how the theory was developed. I won't continue further than to say that what is already given is sufficient to apply Odds Ratings to software and interactive game sites.

2. Point Sequences

A game of skill is one at which absolute mastery is unobtainable. This mathematics can accordingly never deliver, nor process, a 'probability' of one or zero. It measures the relative degree of mastery of players of a particular game.

Chess is a game of skill, although, for a while, Capablanca was thought (even by himself) to have attained such complete mastery. For six years he did not lose a single game. While computers are on the verge of invincibility by human 'masters', there will be no certainty of outcome amongst themselves. The number of electrons in the visible universe is less than the number of games possible.

The relative playing strength of any two players A and B is evident in the proportion of points scored by each. Thus our measurable degree of relative mastery consists of the probability Pab of a win (absolute success) by player A. If A is the stronger player and will risk losing rather than draw, his m'th to his n'th game results Si will consist of the point sequence

Si = int ( ½ + iPab ) – int ( ½ + ( i – 1 ) Pab ) …(i)

where i = m, m + 1, … , n – 1, n

For instance, Pab = ¾, m = 1, n = 4 is the sequence {1,1,0,1}.

On the other hand, were A a cautious player, who will draw before risking a loss, never losing against a weaker player

Si = ½ ( int ( ½ + 2iPab ) – int ( ½ + 2 ( i – 1 ) Pab ) …(ii)

where the same Pab, m and n give the sequence {1, ½,1, ½}.

These idealizations lose no generality, since the reality deviates but little. The deviations can be interpreted as fluctuations in playing strength. We need to infer Pab from these normal point sequences. However, for a given point sequence, Pab is not unique.

Given Pab = ¾, m = 1, n = 8 sequence (i) is {1,1,0,1,1,1,0,1} but Pab = 4/5 produces the sequence {1,1,0,1,1,1,1,0} so that n < 7 cannot distinguish Pab to this precision. On the contrary note, too long a sequence implies too long a rating period, where Pab is not likely to have remained constant for these two players. Further, we are only finding the relative strength of players A and B, and other players need to be integrated into our calculations. This integration requires Odds Ratings, which we shall deduce.

Edited July 9, 2012 by Pymander

hypervalent_iodine · July 10, 2012

!

Moderator Note

Pymander,

Plagiarism is absolutely not acceptable here. If you intend to use the work of somebody else in your posts, 1.) do not make it the sole content of your post and 2.) you have to credit your sources.

Please review our rules and ensure that you observe them in future. I will be checking your other posts for plagiarism.

And for the benefit of those who would like to check the citations: PDF file.

Pymander · July 13, 2012

!

Moderator Note

Pymander,

Plagiarism is absolutely not acceptable here. If you intend to use the work of somebody else in your posts, 1.) do not make it the sole content of your post and 2.) you have to credit your sources.

Please review our rules and ensure that you observe them in future. I will be checking your other posts for plagiarism.

And for the benefit of those who would like to check the citations: PDF file.

This identifies me, which I had hoped to avoid. The work on SwissImmaculate is my own. Pretty rough shot. Its inclusion was for two purposes, one to give it recognition, the other to lend credibility to the rest of my stuff.

Software to test the efficacy of Odds Ratings is written in Pascal, keyboard input. It proves the system works without doubt, but mastery won't be a snack. Worth it though, if you run a chess club. Got the feeling "The Big Bang Theory" is sacred, and alternatives are blasphemy, so I'm done. Thank you all for your participation in the Expansion Tectonics argument. I was not just trolled but totally balrogged. It was fun.

In case the point has escaped anyone, I have been accused of plagiarising my own creation, the cited PDF. But thanks, some may find this document enlightening. Personally, I reckon its worth an honorary degree in maths.

Edited July 13, 2012 by Pymander

**swansont** · July 13, 2012

!

Moderator Note

If this is your own work then obviously it is not plagiarism, but we do pay close attention to copyright-related issues. To present copyrighted material here without any mention of ownership is not the wisest course of action.

Pymander · July 13, 2012

!

Moderator Note

If this is your own work then obviously it is not plagiarism, but we do pay close attention to copyright-related issues. To present copyrighted material here without any mention of ownership is not the wisest course of action.

Thank you. Combining 'science' and 'religion' is apparently also unwise, at least today. That, though, is what is meant by occult, and the only explanation for astrology, and other 'ancient wisdoms'. I'm sure you got some help locating the PDF. Didn't know it was there myself, outside of the cited site, under Probabilistic Rating Theory.

Sign In

Ratings and Probabilities

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Important Information