![]() |
paper: Weeks 1-2 Elo Analysis
Thread created automatically to discuss a document in CD-Media.
Weeks 1-2 Elo Analysis by connor.worley Here is the Python code used to calculate the ratings: Code:
import csv |
Re: paper: Weeks 1-2 Elo Analysis
The usual Twitter data caveats apply :) |
Re: paper: Weeks 1-2 Elo Analysis
How do the results change if you don't include eliminations?
|
Re: paper: Weeks 1-2 Elo Analysis
Teams that were upset in elims move up a bit (most notably 1114 with a 35 spot jump). Here's a quick look at the top 25 without elims:
Code:
Team Elo Net change |
Re: paper: Weeks 1-2 Elo Analysis
I took a look at this last year just for giggles. Included is perl code I used. Also note that I used a k-factor of 32. I normally don't concern myself with rankings/OPR/etc., but I was bored one day.
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
Code:
|
Re: paper: Weeks 1-2 Elo Analysis
Free good rep to the first person to post FORTRAN or COBOL code of this.
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
I was curious how the data looked. I just threw it onto Google Docs(after making it into an Excel file because Docs does not like that amount of characters being directly pasted into it). Here it is.
Edit: Realized this isn't terribly relevant after rereading the thread title. Still, a good visual I think. |
Re: paper: Weeks 1-2 Elo Analysis
Here is a visual using the data for the first 2 weeks using the data from the paper.
|
Re: paper: Weeks 1-2 Elo Analysis
I was somewhat bored this evening and decided to try something different. I decided to try the Microsoft TrueSkill algorithm to rank teams. If you're unfamiliar with TrueSkill, it's the algorithm used on XBoxLive to determine fair matchups. It's based on the Glicko rating system, which was developed as an enhanced Elo rating. Glicko introduces Bayesian statistics to the Elo system and TrueSkill takes it a step further and includes team-based play. The results were...interesting. The first column is team #, second column is the team's skill rating, the third is sigma (standard deviation)
Code:
3478 32.1901547 3.20363857Code:
from trueskill import * |
Re: paper: Weeks 1-2 Elo Analysis
Very interesting results, thanks for reviving this!
|
Re: paper: Weeks 1-2 Elo Analysis
I think there are quite a few biases still because of the way we do regionals/districts. Teams that only compete in a week 1 or 2 regional and do well (but not well enough to go to championship), will be rated artificially high because the only teams they're compared against are the other teams at the regional who still have the default rating. There is no opportunity to mark them down later on if they don't compete again (but other teams from that regional do). What would be possible is to use results from prior years as well, but shift them toward the mean at the beginning of each season. This would account for good teams tend to consistently be good teams (to paraphrase Jim Zondag). This sort of thing is done in the Elo rankings of NFL teams by Nate Silver (http://fivethirtyeight.com/datalab/i...l-elo-ratings/) There are some other things we can take away from Silver as well. For example, we could possibly modify the algorithm to weigh eliminations more than the qualifications, something that's also done in the World Football Elo Ratings. See this wiki article: http://en.wikipedia.org/wiki/World_F...n_princ iples Basically, the K value (the factor that determins how much to change a participating team's elo rating) is modified based on how important a match is.
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
https://dl.dropboxusercontent.com/u/5193107/scores.zip You'll need to install the trueskill package ('pip install trueskill') to use it. |
Re: paper: Weeks 1-2 Elo Analysis
So I decided to take this data a little bit further. What I did was to take all of these calculations and run them through the data this year and try to predict matches. For this, I also included a modified Elo system that has diminishing returns for large margins of victory (Calling this Elo Mod). I got some rather surprising results.
My baseline was just using OPR for predicting match outcomes, it was able to predict about 77.1% of the matches this year. This was calculated by adding up the OPRs of each alliance and comparing with the result of the match. TrueSkill was able to predict 79.0% of the matches, a pretty good improvement. I need to develop the prediction model a bit better because it currently doesn't take into account the standard deviation as a measure of certainty. The modified Elo system was able to predict 79.5% of matches, an improvement over TrueSkill. The baseline, unadulterated Elo system as used in this thread was able to predict a whopping 81.4% of matches, by far the best out of any of these models. There is still room for improvement with the TrueSkill and Modified Elo. With the modified Elo, there are some constants that can be tuned for better results. But overall, the results are somewhat interesting. It seems that no matter the ranking model used, about 1 in 5 qualification matches will result in an upset. Here is the spreadsheet I used: https://dl.dropboxusercontent.com/u/...Trueskill.xlsx |
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
I've tried the L1 optimization problem, but l1-magic is giving me fits. For some reason, it blows up after 10 iterations.
However, I have done the analysis that I wanted. I calculated all the rating systems using data from events prior to CMP, then I used that data to predict CMP matches. In summary: OPR: 72.46% Correct TrueSkill: 69.31% Correct Elo: 72.90% Correct Elo Mod: 71.71% Correct TL;DR: We're okay, but not great at predicting matches. OPR is okay at it, but Elo is better. I'm still somewhat surprised that Elo is slightly better. Updated Spreadsheet: https://dl.dropboxusercontent.com/u/...skill%202.xlsx |
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
Code:
Secondly, what you want to find is the min L1 norm of the residuals, not of the solution vector itself. For the set of overdetermined linear equations Ax ≈ b, x is the solution vector. The residuals are b-Ax. So you want find a solution vector x which minimizes the L1 norm of b-Ax. |
Re: paper: Weeks 1-2 Elo Analysis
Attached is a comparison of b-Ax residuals for L2 and L1 OPR. Alliance scores computed from L1 OPR are within +/-10 points of the actual scores 33.5% of the time. Alliance scores computed from L2 OPR are within +/-10 points of the actual scores only 22.4% of the time. It is on that basis that I postulate that L1 OPR might be a better predictor of match outcome. [EDIT]Cannot add attachments to threads associated with papers. Brandon: can you please change this setting to allow attachments? Thank you.[/EDIT] |
Re: paper: Weeks 1-2 Elo Analysis
Quote:
Our numbers are very close, but I had expected them to be identical. Here's a link to an XLS spreadsheet. |
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
77.11% correct (counting ties as incorrect) 77.93% correct (counting ties as correct) |
Re: paper: Weeks 1-2 Elo Analysis
Quote:
|
Re: paper: Weeks 1-2 Elo Analysis
Quote:
Ax ≈ b, where A is the (binary) design matrix of alliances, b is a column vector of alliance scores, and x is what you are trying to find: a column vector of team "OPR" scores. There is no exact solution for x, since the system is overdetermined. So the idea is to find the "best" solution (in some sense of the word "best"). Notice that the left-hand side (Ax) is a column vector of alliance scores computed from whatever solution x you come up with. The residuals are b-Ax: a column vector of the differences between the actual alliance scores (b) and the computed alliance scores (Ax). Looking at it that way, it becomes clear that what you are trying to do is find a solution x which minimizes the residuals (in some sense of the word "minimize"). The most common way to do this is to find x which minimizes the L2 norm of the residuals. The L2 norm of a vector is the square root of the sum of the squares of the vector's elements. The L2 norm solution is also known as the "least squares" solution (for obvious reasons). It turns out that finding the x which minimizes the L2 norm of b-Ax is computationally straightforward. In Octave, it's one line of code: x = A\b. The backslash in this context is known as "left division". The syntax is simple, but under the hood there's a lot going on. For the Ax ≈ b overdetermined linear systems were are dealing with in FRC to compute OPR scores, it turns out that there is a computationally faster way to compute the least squares solution for x. Here's how: Multiply both sides of Ax ≈ b by the transpose of A to get A'Ax = A'b, or Nx =d where N=A'A and d = A'b.But "least squares" (min L2 norm of residuals) is not the only possible "best fit" solution to the overdetermined system Ax ≈ b. For example, there's the "Least Absolute Deviations (LAD)" solution (min L1 norm of residuals). The L1 norm of a vector is the sum of the absolute values of the vector's elements. Finding an LAD solution for Ax ≈ b is more computationally intensive than least squares. Perhaps the best way to proceed is to convert the problem to a "Linear Program" (LP) and then use one of the many LP solvers. For example, here's the AMPL code I used to compute the LAD OPR for your data: Code:
param m; |
| All times are GMT -5. The time now is 19:02. |
Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi