View Full Version : paper: FRC Elo 2008-2016
Caleb Sykes
22-12-2016, 18:49
Thread created automatically to discuss a document in CD-Media.
FRC Elo 2008-2016 (http://www.chiefdelphi.com/media/papers/3306?) by Caleb Sykes
Caleb Sykes
22-12-2016, 19:03
Sorry, I found a bug about 15 seconds after posting, I am uploading a revised version now.
Caleb Sykes
22-12-2016, 19:19
This workbook describes the Elo ratings of every team in FRC since 2008. Every match since 2008 is used, and the model predictions and results can be found in the year sheets. Team Elo ratings at the end of each season can be found in the "End of Season Elos" sheet. Average team Elo ratings for each season can be found in the "Average Elos" sheet. Detailed information about each team can be found in the "Team Lookup" sheet. To use the "team lookup" sheet, simply enter a team number into cell B2 and press the "Update" button.
My biggest takeaway from this whole endeavor was how incredibly dominant 1114 was during the period 2008-2011. After their first event in 2008, this model predicts them to win every single remaining match in 2008, every match in 2009, every match in 2010, and every match at Pittsburgh in 2011. Their end of season Elo in 2008 was 200 points higher than the next highest rated team.
I will be following up soon with a comparison of predictive models.
Very nice tool you have here. 1114 is a fun team to watch the ELO for, which got quite high in 2010 until they threw the match.
Is it possible to adjust some of the parameters? Because the end of season reversion to the mean seams to small. 538's models for basketball (http://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/) and football have at least a 25% reversion to mean. 25% seems like a lower bound since each team loses a class of seniors and build a new robot each season, and the latter should really drives this model.
As an example of this, the highest ELO from 2016 was a 254 qual match at their first event, which is really a carryover from their 2015 season. But this might be inevitable in some cases, like 538 notes in their NBA model that teams with superstars like Bulls and Cavs maintained high ELOs for a while after Jordan and LeBron left.
Caleb Sykes
22-12-2016, 20:32
Very nice tool you have here. 1114 is a fun team to watch the ELO for, which got quite high in 2010 until they threw the match.
Is it possible to adjust some of the parameters? Because the end of season reversion to the mean seams to small. 538's models for basketball (http://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/) and football have at least a 25% reversion to mean. 25% seems like a lower bound since each team loses a class of seniors and build a new robot each season, and the latter should really drives this model.
As an example of this, the highest ELO from 2016 was a 254 qual match at their first event, which is really a carryover from their 2015 season. But this might be inevitable in some cases, like 538 notes in their NBA model that teams with superstars like Bulls and Cavs maintained high ELOs for a while after Jordan and LeBron left.
I chose the parameters I did based on what was the most predictive. I would have expected the mean reversion to be stronger, but 20% seemed to work the best. Here are the Brier scores for 2012-2014 for various mean reversion parameters.
100% 0.213986028
90% 0.210139727
80% 0.206626172
70% 0.203459364
60% 0.200667838
50% 0.198303146
40% 0.196450541
30% 0.19524134
20% 0.194865801
10% 0.195581409
0% 0.197702345
I chose the parameters I did based on what was the most predictive. I would have expected the mean reversion to be stronger, but 20% seemed to work the best. Here are the Brier scores for 2012-2014 for various mean reversion parameters.
What about the Brier score for longer windows? Between 2012 and 2014, the mean only reverts twice, and the relative error (|a-b|/|b|) between the Brier scores for different parameters is less than .1 in even the most extreme cases. With more reversions, that parameter should effect the accuracy more extremely, giving a better parameter estimate.
Caleb Sykes
22-12-2016, 22:09
What about the Brier score for longer windows? Between 2012 and 2014, the mean only reverts twice, and the relative error (|a-b|/|b|) between the Brier scores for different parameters is less than .1 in even the most extreme cases. With more reversions, that parameter should effect the accuracy more extremely, giving a better parameter estimate.
2008-2016 quals and playoffs:
100% 0.201437771
90% 0.198323411
80% 0.195629645
70% 0.193352899
60% 0.191478127
50% 0.189986095
40% 0.188863874
30% 0.188124089
20% 0.187848762
10% 0.188296258
0% 0.190149139
Caleb Sykes
23-12-2016, 12:10
I ran the model for 2008-2016, but only took the Brier score for 2016.
100% 0.203023179
90% 0.199892169
80% 0.197203494
70% 0.19494991
60% 0.193105915
50% 0.191632998
40% 0.190483555
30% 0.189609862
20% 0.189008209
10% 0.188894837
0% 0.190274481
Interestingly, these results imply that 10% mean reversion would have been ever so slightly better than 20% for the 2015-2016 offseason. I don't want to draw any larger conclusins though since 2015 was a weird year.
MARS_James
23-12-2016, 12:32
This is honestly one of the coolest documents to look at, especially to look at teams elo when they have either gained or lost key mentors and seeing how it had a short or long term impact in comparison to the field.
remulasce
23-12-2016, 22:43
Hey, I appreciate you spending the time to put this together.
Question: Is there any way to use the team lookup, without having to purchase Excel? Google Docs obviously won't run the program. Neither will LibreOffice. The free Windows Modern (nee' Metro) app won't run the function, and I don't have access to Dreamspark any more.
Obviously, Excel is a powerful tool which is standard in many environments. But you're really limiting who can actually use your work if we need to pay MS a $150 entry fee to do so.
Mark McLeod
24-12-2016, 00:32
OpenOffice works with it.
Sorry, it was a later version of Excel.
Caleb Sykes
24-12-2016, 00:49
Okay, I spent a bunch of time looking at the mean-reversion parameter and the results are extremely interesting. First, I tried running every 2-year period individually and found the best mean reversion just for that period. Here were the results:
2008-2009 35%
2009-2010 40%
2010-2011 40%
2011-2012 30%
2012-2013 30%
2013-2014 35%
2014-2015 35%
2015-2016 35%
The mean reversion was pretty high and relatively constant for all years.
Next, I found the best mean reversion for 2009 given 2008. Then I found the best mean reversion for 2010 given 2008 and 2009, and so on. In this way, each year would have a distinct mean reversion that builds off of the previous mean reversions. Here were the results:
2008-2009 35%
2009-2010 35%
2010-2011 30%
2011-2012 20%
2012-2013 20%
2013-2014 25%
2014-2015 30%
2015-2016 25%
These values start high, as in the previous case, but they seem to drop after a while as the model learns more about the teams.
Finally, I compared how predictive the previous model was in comparison to my original 20% for all years, the results are attached.
Interestingly, adjusting the mean reversion every year actually fares worse overall than just using 20% every year, even if you throw out 2015 and 2016 because 2015 was an outlier year in many respects. I think the reason for this is because team performance 2 years in the future can still be reasonably well predicted by a current season's performance. The constantly updating model seems to put the mean reversion parameter too high to fully account for this 2 year explanatory effect.
Caleb Sykes
24-12-2016, 00:52
The results are attached.
Well, they would be if I could figure out how to attach them. I don't seem to have permission to add attachments, even though this is my thread.
Well, they would be if I could figure out how to attach them. I don't seem to have permission to add attachments, even though this is my thread.
Put 'em in the whitepaper's slot--you can attach multiple documents to one whitepaper. The Extra Discussion forum doesn't allow attachments (or deletions, generally speaking) due to some logic that I don't remember but makes sense from when I was informed about it.
SoftwareBug2.0
24-12-2016, 00:54
Hey, I appreciate you spending the time to put this together.
Question: Is there any way to use the team lookup, without having to purchase Excel? Google Docs obviously won't run the program. Neither will LibreOffice. The free Windows Modern (nee' Metro) app won't run the function, and I don't have access to Dreamspark any more.
Obviously, Excel is a powerful tool which is standard in many environments. But you're really limiting who can actually use your work if we need to pay MS a $150 entry fee to do so.
I don't know if this will make you feel any better, but I opened it with Excel and it didn't work and brought up a VB debugging window. :p
Caleb Sykes
24-12-2016, 00:58
I don't know if this will make you feel any better, but I opened it with Excel and it didn't work and brought up a VB debugging window. :p
Have you downloaded it recently? My very original upload had a bug which I have since corrected, you might have been one of the 6 people who downloaded it before I deleted it.
Caleb Sykes
24-12-2016, 01:05
Put 'em in the whitepaper's slot--you can attach multiple documents to one whitepaper. The Extra Discussion forum doesn't allow attachments (or deletions, generally speaking) due to some logic that I don't remember but makes sense from when I was informed about it.
I have put them with the whitepaper. I really don't like doing that though since I prefer my whitepapers to be fully self-explanatory, and for this I really just wanted to post some data which had context provided by my post.
I would love to hear the reasoning for this restriction sometime if anyone knows it.
Joe Ross
24-12-2016, 01:27
OpenOffice works with it.
The team lookup tab didn't work for me in OpenOffice 4.1.3, or Excel 2007. It did work in Excel 2013. it looks like some of the features it uses were added in Excel 2013.
SoftwareBug2.0
24-12-2016, 01:40
Have you downloaded it recently? My very original upload had a bug which I have since corrected, you might have been one of the 6 people who downloaded it before I deleted it.
I was one of the six, but then it didn't seem to get fixed when I re-downloaded it. I'm also having no luck with LibreOffice.
The other parts of the spreadsheet are interesting to look at though. It's fun to get some numbers to see both how horrible my team was in 2008 and how good we were in 2013.
Mark McLeod
24-12-2016, 09:00
The team lookup tab didn't work for me in OpenOffice 4.1.3, or Excel 2007. It did work in Excel 2013. it looks like some of the features it uses were added in Excel 2013.
My mistake, I mixed my machines and which one had OpenOffice and which had Excel.
Anteprefix
24-12-2016, 11:09
The team lookup tab didn't work for me in OpenOffice 4.1.3, or Excel 2007. It did work in Excel 2013. it looks like some of the features it uses were added in Excel 2013.
It seems to work fine if you replace all instances of FullSeriesCollection with SeriesCollection and saving the changes in the debugger.
Cothron Theiss
24-12-2016, 14:04
The Extra Discussion forum doesn't allow attachments (or deletions, generally speaking) due to some logic that I don't remember but makes sense from when I was informed about it.
Was it because of all the spam going on in the Rumor Mill, Chit-Chat, and Extra Discussion?
Was it because of all the spam going on in the Rumor Mill, Chit-Chat, and Extra Discussion?
No, it was something to do with the risk of stuff happening with the original paper/picture. This was before the spam influx.
Michael Hill
25-12-2016, 02:44
A couple things, it appears you are just using the raw elo differences in calculating red win likelihood. that is (red1+red2+red3) - (blue1 + blue2 + blue3).
I'm thinking if you're going to calculate win chance, you want to average out the elo on each side. However, it seems FRC Elo win percentages don't quite follow chess win percentages based on Elo. I went ahead and generated a cumulative distribution plot based on 2016 match data (and given elo ratings from the spreadsheet). I got what is shown in the plot below. The blue line is the "standard" chess Elo win probability CDF (a logistic distribution CDF), while the orange is from match data. I fit both a logistic CDF (gray) and Gaussian CDF (yellow).
The modded Logistic Dist had a mean of 0 and st. dev of 55 while the Gaussian dist had a mean of 0 and st. dev of 93.
http://i.imgur.com/VOWQVNn.png
What does this mean? Well, potentially, difference in Elo rating could potentially be a better predictor of winning FRC matches than chess matches. That is, a small difference in average alliance Elo rating has a larger effect on Win % in FRC (2016) than chess.
Michael Hill
25-12-2016, 02:51
Another thing to consider, however, is the distribution of Elo differences. So it's potentially a bit less useful than I made it out to be in the previous post because a huge amount of matches have a fairly small Elo difference.
http://i.imgur.com/bJRlgqu.png
Caleb Sykes
25-12-2016, 12:55
A couple things, it appears you are just using the raw elo differences in calculating red win likelihood. that is (red1+red2+red3) - (blue1 + blue2 + blue3).
I'm thinking if you're going to calculate win chance, you want to average out the elo on each side. However, it seems FRC Elo win percentages don't quite follow chess win percentages based on Elo. I went ahead and generated a cumulative distribution plot based on 2016 match data (and given elo ratings from the spreadsheet). I got what is shown in the plot below. The blue line is the "standard" chess Elo win probability CDF (a logistic distribution CDF), while the orange is from match data. I fit both a logistic CDF (gray) and Gaussian CDF (yellow).
The modded Logistic Dist had a mean of 0 and st. dev of 55 while the Gaussian dist had a mean of 0 and st. dev of 93.
http://i.imgur.com/VOWQVNn.png
What does this mean? Well, potentially, difference in Elo rating could potentially be a better predictor of winning FRC matches than chess matches. That is, a small difference in average alliance Elo rating has a larger effect on Win % in FRC (2016) than chess.
Looking at Elo averages instead of sums should be equivalent to changing the x-scale on the cdf by a factor of 3, and that looks like what you have posted. It doesn't really change anything, because all you are doing is changing the scale. I used the sums in my calculations, which should provide a cdf similar to those found in things like chess.
Caleb Sykes
25-12-2016, 13:28
Looking at Elo averages instead of sums should be equivalent to changing the x-scale on the cdf by a factor of 3, and that looks like what you have posted. It doesn't really change anything, because all you are doing is changing the scale. I used the sums in my calculations, which should provide a cdf similar to those found in things like chess.
Other methods of combining alliance Elos, such as taking each alliance's max Elo or doing some kind of weighted average would make a difference. I just haven't investigated these alternatives.
We played around with TrueSkill last year...
https://github.com/thedropbears/TrueSkill
TrueSkill is the natural successor to Elo. It was created at Microsoft for online matchmaking, and as such is able to deal with alliances of players.
A good explanation of the algorithm is here:
http://www.moserware.com/2010/03/computing-your-skill.html
vBulletin® v3.6.4, Copyright ©2000-2017, Jelsoft Enterprises Ltd.