Log in

View Full Version : 2016 Pre-Champs ELO Ratings


wjordan
22-04-2016, 15:29
I wrote a small Python script (similar to this thread from 2014 (http://www.chiefdelphi.com/forums/showthread.php?t=127825)) to calculate the season-long Elo ratings for all 3000+ plus FRC teams that competed in the 2016 season. Here's the top 100 (keep in mind this a fairly untuned model):
Rank,Team,Elo Rating,# Played,Win %
1,frc2056,2564.8708516684,32,0.84375
2,frc987,2556.7395845976,31,0.8709677419
3,frc148,2513.3702482568,32,0.8125
4,frc2771,2509.4705400282,48,0.75
5,frc1241,2437.9107936136,35,0.8571428571
6,frc1519,2435.0658342396,36,0.9444444444
7,frc133,2432.4950118635,36,0.8888888889
8,frc118,2410.6631929387,28,0.8571428571
9,frc27,2408.0850729022,45,0.8
10,frc359,2393.7669444743,32,0.90625
11,frc1678,2365.1427556916,26,0.8846153846
12,frc1983,2352.9683563234,48,0.7708333333
13,frc1023,2327.1563415467,46,0.8260869565
14,frc1501,2314.8914797357,48,0.7916666667
15,frc1540,2308.9583510968,36,0.8333333333
16,frc4564,2299.9814293025,36,0.8611111111
17,frc225,2286.8436656747,36,0.9166666667
18,frc319,2267.1466845432,48,0.7291666667
19,frc2046,2262.0104039536,48,0.7291666667
20,frc3620,2258.2489588419,36,0.8055555556
21,frc2767,2231.1523749957,36,0.8888888889
22,frc125,2203.9042618114,58,0.6724137931
23,frc217,2193.7623220698,45,0.7111111111
24,frc67,2193.5383205599,36,0.8611111111
25,frc254,2190.187307243,19,0.9473684211
26,frc195,2174.4289095983,36,0.8055555556
27,frc179,2164.4363863313,27,0.9259259259
28,frc2013,2162.9140605587,33,0.7272727273
29,frc33,2155.8558874617,36,0.7777777778
30,frc971,2150.359414525,16,0.9375
31,frc4188,2147.4291381874,46,0.7608695652
32,frc910,2143.8934792287,33,0.7878787879
33,frc2590,2142.9579313632,45,0.6888888889
34,frc25,2142.4412456836,36,0.8055555556
35,frc4450,2139.5744792782,36,0.8055555556
36,frc4967,2138.8294065088,36,0.7222222222
37,frc2468,2133.6054741145,32,0.78125
38,frc1024,2132.5886298519,48,0.7083333333
39,frc2974,2126.6132833861,36,0.8055555556
40,frc1746,2122.4741745816,36,0.8611111111
41,frc1986,2105.5705944366,20,0.95
42,frc4334,2090.8271976938,23,0.8695652174
43,frc1261,2087.3494490483,48,0.7083333333
44,frc3314,2076.4327449113,57,0.701754386
45,frc4488,2076.2542185479,36,0.7222222222
46,frc1533,2072.1356193236,48,0.6458333333
47,frc3230,2071.4881103811,35,0.7142857143
48,frc2481,2068.7886898013,21,0.9523809524
49,frc494,2065.1162537725,36,0.8055555556
50,frc869,2062.6485747098,48,0.6875
51,frc230,2061.1864101023,48,0.6458333333
52,frc5279,2060.1866212576,36,0.7777777778
53,frc238,2057.8208919517,48,0.6875
54,frc5172,2055.2630353144,18,0.8888888889
55,frc2067,2054.8484971032,48,0.6875
56,frc1318,2048.8236076466,48,0.7291666667
57,frc3250,2036.8233126059,30,0.7333333333
58,frc5050,2034.4232765669,36,0.7777777778
59,frc4469,2033.0203537629,36,0.8055555556
60,frc3310,2031.833668903,20,0.95
61,frc1058,2029.9012706186,48,0.6458333333
62,frc330,2023.303386855,20,0.85
63,frc4468,2020.8796496031,36,0.7777777778
64,frc365,2015.2776518452,36,0.75
65,frc868,2013.0139180101,46,0.7173913043
66,frc1418,2004.4858845658,34,0.7352941176
67,frc2415,2002.7041652616,36,0.6944444444
68,frc16,2001.6149194784,19,0.8947368421
69,frc3688,2000.4276164977,36,0.7222222222
70,frc3990,1993.4366845031,21,0.9047619048
71,frc1747,1989.0741465466,48,0.6875
72,frc3357,1987.188575724,48,0.6875
73,frc368,1982.42880364,20,0.95
74,frc5460,1981.5616311026,36,0.6944444444
75,frc3238,1976.6527065089,36,0.7777777778
76,frc525,1976.440984525,18,0.8888888889
77,frc4384,1973.7651648588,36,0.5833333333
78,frc836,1972.003555699,34,0.7352941176
79,frc71,1970.1239091472,36,0.6944444444
80,frc56,1969.5271280769,36,0.6944444444
81,frc1731,1966.9417414622,36,0.75
82,frc1305,1964.9589066452,31,0.7096774194
83,frc1918,1961.9294395633,36,0.75
84,frc3604,1960.9763143508,36,0.7777777778
85,frc41,1959.5598450238,36,0.5833333333
86,frc2474,1957.2471714871,36,0.6666666667
87,frc4063,1956.7118381526,29,0.724137931
88,frc1425,1953.6830324716,36,0.75
89,frc1806,1951.7294052903,19,0.8947368421
90,frc2471,1950.9437092534,36,0.7222222222
91,frc3683,1947.6203673269,23,0.7826086957
92,frc1114,1944.3669483275,23,0.7826086957
93,frc5895,1941.0093857167,36,0.7222222222
94,frc3618,1941.0017737099,48,0.7083333333
95,frc3309,1934.657254773,32,0.6875
96,frc107,1931.3958797577,36,0.6944444444
97,frc1124,1921.5875314542,48,0.6666666667
98,frc4103,1919.8324175315,36,0.6944444444
99,frc3021,1917.1307148459,21,0.8571428571
100,frc85,1914.7890005898,36,0.7222222222
Full ratings can be found here: http://wesj.org/documents/elos_2016.csv

Methodology: I initialized all teams at the beginning of the season at 1500, and had ratings persist between competitions. The only matches considered by the model were qualification matches at official events. I decided to discount playoff matches because I wanted the ratings to reflect the best robots at an event, not necessarily the best alliances. Adding elimination matches massively inflated the ratings of 2nd picks on very strong alliances, often making them the third-highest rated robot at an event despite that usually not being the case. (I can post ratings with eliminations if people really want them, however)
I also used a margin of victory multiplier similar to the one used for FiveThirtyEight's NBA Elo Ratings (http://http://fivethirtyeight.com/features/how-we-calculate-nba-elo-ratings/#fn-2), which rewards underdogs for upsetting higher alliances, but for stronger alliances only rewards a little for beating weaker alliances. Most of the tuning values I used for these rankings were taken from the 538 values for the NBA, largely due to the rough similarity in scores between the NBA and Stronghold.

Here's the script I used. I added parameters for the k-values, as well as the margin of victory multiplier function, and match level. (The 'tba.event_get()' method is taken from the the TBA wrapper script I use (http://wesj.org/documents/bluealliance.py), and essentially just gets the event matches and teams from the TBA API and parses them into a dict using json.loads()).

def elos(event_key, k=20, mov=lambda elodiff, scorediff: ((scorediff + 3) ** 0.8) / (7.5 + 0.006 * (elodiff)), level='qm'):
event = tba.event_get(event_key)
elos = {}
for team in event.teams:
elos[team['key']] = 1500
played[team['key']] = 0

for match in event.matches:
if level is not None and match['comp_level'] != level: continue
red = match['alliances']['red']
blue = match['alliances']['blue']
red_elo = statistics.mean(elos[team] for team in red['teams'])
blue_elo = statistics.mean(elos[team] for team in blue['teams'])

expected_score_red = 1. / (1 + 10 ** ((red_elo - blue_elo)/400.0))
expected_score_blue = 1. / (1 + 10 ** ((blue_elo - red_elo) / 400.0))

score_break = match['score_breakdown']

red_score = red['score']
blue_score = blue['score']
actual_score = 0.0
margin_mult = 1.0

if red_score > blue_score:
actual_score = 1.0
margin_mult = mov(red_elo-blue_elo, red_score-blue_score)
#margin_mult = ((red_score - blue_score + 3) ** 0.8) / (7.5 + 0.006 * (red_elo - blue_elo))
elif red_score == blue_score:
actual_score = 0.5
#margin_mult = (3 ** 0.8) / (7.5 + 0.006 * (red_elo - blue_elo))
margin_mult = mov(0, 0)
else:
actual_score = 0.0
margin_mult = ((blue_score - red_score + 3) ** 0.8) / (7.5 + 0.006 * (blue_elo - red_elo))
margin_mult = mov(blue_elo - red_elo, blue_score - red_score)

for team in red['teams']:
elos[team] += k * margin_mult * (actual_score - expected_score_red)
played[team] += 1

for team in blue['teams']:
elos[team] += k * margin_mult * (1-actual_score - expected_score_blue)
played[team] += 1

return elos

Please let me know if you have any suggestions or questions about these ratings.

logank013
22-04-2016, 15:37
What exactly does ELO stand for? I've never heard of this statistic.

Citrus Dad
22-04-2016, 15:50
How do you account for the lack of scoring for breach and capture during quals? That's critical to comparisons for elims.

logank013
22-04-2016, 15:53
How do you account for the lack of scoring for breach and capture during quals? That's critical to comparisons for elims.

Are you talking about the missed 25 and 20 point bonuses in the eliminations by not including the elims? Thanks.

Peter Matteson
22-04-2016, 15:55
What exactly does ELO stand for? I've never heard of this statistic.

It's a name not an acronym:
https://en.wikipedia.org/wiki/Elo_rating_system

Fivethirtyeight.com uses it for a lot of there analyses and I'm growing to like it as a ranking method..

Brian Maher
22-04-2016, 15:56
What exactly does ELO stand for? I've never heard of this statistic.

It's actually Elo rating, named after its creator, Arpad Elo.

Chris is me
22-04-2016, 15:57
How do you account for the lack of scoring for breach and capture during quals? That's critical to comparisons for elims.

As stated in the OP, elims results are not accounted for in the model.

wjordan
22-04-2016, 16:03
How do you account for the lack of scoring for breach and capture during quals? That's critical to comparisons for elims.
It doesn't account for elimination matches, but I've thought about adding the point bonuses for quals matches. I need to write some sort of prediction accuracy code to see if it's any more predictive.

Richard Wallace
22-04-2016, 16:40
One source of error in this system arises because non-district teams play fewer qualifying matches than district teams. More capable teams, such as 16, 254, 330, 971, 2481, etc. would need a few more matches for their Elo ratings to converge from the initial seed (1500) toward a figure that better represents their performance.

wjordan
22-04-2016, 17:10
One source of error in this system arises because non-district teams play fewer qualifying matches than district teams. More capable teams, such as 16, 254, 330, 971, 2481, etc. would need a few more matches for their Elo ratings to converge from the initial seed (1500) toward a figure that better represents their performance.
This is a major flaw to the Elo system (at least when applied to FRC), and it's (as well as general normalizing between events) been something I've been trying to figure out for awhile. I was thinking about doing a straightforward multiplier to amount of Elo added for each match depending on the type of the event (i.e. District Qualifiers would add half as much Elo per match as a regional / DCMP would), but that would still massively favor teams that go to 3 regionals or 3+ district events. I also thought about normalizing the amount of Elo added per match by how many matches the team has played previously, but that would then disproportionately favor the teams that go to a single 8-play regional.

Richard Wallace
22-04-2016, 17:31
This is a major flaw to the Elo system, and it's (as well as general normalizing between events) been something I've been trying to figure out for awhile. I was thinking about doing a straightforward multiplier to amount of Elo added for each match depending on the type of the event (i.e. District Qualifiers would add half as much Elo per match as a regional / DCMP would), but that would still massively favor teams that go to 3 regionals or 3+ district events. I also thought about normalizing the amount of Elo added per match by how many matches the team has played previously, but that would then disproportionately favor the teams that go to a single 8-play regional.What about running the Elo twice, using first pass results to seed the second pass?

Carl C
22-04-2016, 18:04
I have been developing my own Elo rating system for FRC over the past few years. The way it works differs from the one in this post enough that I thought it might by interesting to compare.

The data in my ratings are based on the history of each team dating back to 2002. At the end of each season, ratings are truncated 80% closer to the starting rating. Since 0 is the starting rating, a team with a score of 100 at the end of one season will begin the next season at 80. As this system uses matches from different games, it does not use win margins. A "K factor" of 32 is used in each match except in playoff matches, where the K factor is 16 (I too found that playoff matches seemed less predictive of future matches than qualification matches).

At the 2014 FIRST Championship, this system had a 0.190 Brier score, so it at least performs better than flipping a coin!

Anyways, enough methodology. Here are the current rankings:


Rank | Team | Rating
1. frc254 448
2. frc1519 421
3. frc225 388
4. frc1678 381
5. frc195 373
6. frc2481 368
7. frc118 364
8. frc987 345
9. frc2056 344
10. frc359 340
11. frc1023 336
12. frc1241 333
13. frc1114 333
14. frc525 329
15. frc148 327
16. frc1986 325
17. frc330 323
18. frc67 321
19. frc133 312
20. frc2767 310
21. frc33 306
22. frc3310 300
23. frc16 299
24. frc1806 299
25. frc4564 295
26. frc1501 292
27. frc125 290
28. frc3238 284
29. frc5172 278
30. frc971 278
31. frc368 276
32. frc494 275
33. frc5254 270
34. frc2122 269
35. frc2771 268
36. frc3130 267
37. frc1619 266
38. frc1024 266
39. frc4334 264
40. frc4469 263
41. frc3683 263
42. frc2974 262
43. frc179 259
44. frc3314 259
45. frc27 259
46. frc3230 256
47. frc1540 254
48. frc1318 254
49. frc1983 254
50. frc3990 253
51. frc3339 253
52. frc126 249
53. frc2451 247
54. frc107 245
55. frc1730 245
56. frc3604 244
57. frc2067 243
58. frc3824 243
59. frc2468 243
60. frc233 242
61. frc2590 242
62. frc4103 239
63. frc180 238
64. frc1918 237
65. frc973 236
66. frc1418 236
67. frc4488 236
68. frc25 236
69. frc3688 235
70. frc5050 235
71. frc70 234
72. frc2883 232
73. frc1261 232
74. frc2848 232
75. frc341 229
76. frc1296 228
77. frc1746 227
78. frc177 227
79. frc868 226
80. frc4039 226
81. frc3937 226
82. frc744 225
83. frc1717 222
84. frc2614 222
85. frc4188 221
86. frc85 220
87. frc2137 220
88. frc3309 219
89. frc217 219
90. frc1065 219
91. frc1425 218
92. frc4967 218
93. frc836 218
94. frc1126 217
95. frc1836 216
96. frc2471 216
97. frc3255 215
98. frc2338 215
99. frc231 214
100. frc4003 214


I will try to post the source code and some of the research I used to develop this system at some point after finals.

EDIT: Here (https://gist.github.com/CarlColglazier/0d09855e5d39003807d34bf0e86f06c9) are the full rankings if anyone is interested.

wjordan
22-04-2016, 19:13
Someone pointed out an error in the script I posted, which made the ratings slightly incorrect. I fixed the error and updated the rankings in the OP.

As for predictive power, this set of rankings had a Brier score of 0.155.

Richard Wallace
22-04-2016, 19:16
...ratings are based on the history of each team dating back to 2002 ...

83. frc1717 222

...

I can see at least one reason that "past performance does not guarantee future results", as the investment prospectus always says.