2016 Pre-Champs ELO Ratings

I wrote a small Python script (similar to this thread from 2014) to calculate the season-long Elo ratings for all 3000+ plus FRC teams that competed in the 2016 season. Here’s the top 100 (keep in mind this a fairly untuned model):

Rank,Team,Elo Rating,# Played,Win %
1,frc2056,2564.8708516684,32,0.84375
2,frc987,2556.7395845976,31,0.8709677419
3,frc148,2513.3702482568,32,0.8125
4,frc2771,2509.4705400282,48,0.75
5,frc1241,2437.9107936136,35,0.8571428571
6,frc1519,2435.0658342396,36,0.9444444444
7,frc133,2432.4950118635,36,0.8888888889
8,frc118,2410.6631929387,28,0.8571428571
9,frc27,2408.0850729022,45,0.8
10,frc359,2393.7669444743,32,0.90625
11,frc1678,2365.1427556916,26,0.8846153846
12,frc1983,2352.9683563234,48,0.7708333333
13,frc1023,2327.1563415467,46,0.8260869565
14,frc1501,2314.8914797357,48,0.7916666667
15,frc1540,2308.9583510968,36,0.8333333333
16,frc4564,2299.9814293025,36,0.8611111111
17,frc225,2286.8436656747,36,0.9166666667
18,frc319,2267.1466845432,48,0.7291666667
19,frc2046,2262.0104039536,48,0.7291666667
20,frc3620,2258.2489588419,36,0.8055555556
21,frc2767,2231.1523749957,36,0.8888888889
22,frc125,2203.9042618114,58,0.6724137931
23,frc217,2193.7623220698,45,0.7111111111
24,frc67,2193.5383205599,36,0.8611111111
25,frc254,2190.187307243,19,0.9473684211
26,frc195,2174.4289095983,36,0.8055555556
27,frc179,2164.4363863313,27,0.9259259259
28,frc2013,2162.9140605587,33,0.7272727273
29,frc33,2155.8558874617,36,0.7777777778
30,frc971,2150.359414525,16,0.9375
31,frc4188,2147.4291381874,46,0.7608695652
32,frc910,2143.8934792287,33,0.7878787879
33,frc2590,2142.9579313632,45,0.6888888889
34,frc25,2142.4412456836,36,0.8055555556
35,frc4450,2139.5744792782,36,0.8055555556
36,frc4967,2138.8294065088,36,0.7222222222
37,frc2468,2133.6054741145,32,0.78125
38,frc1024,2132.5886298519,48,0.7083333333
39,frc2974,2126.6132833861,36,0.8055555556
40,frc1746,2122.4741745816,36,0.8611111111
41,frc1986,2105.5705944366,20,0.95
42,frc4334,2090.8271976938,23,0.8695652174
43,frc1261,2087.3494490483,48,0.7083333333
44,frc3314,2076.4327449113,57,0.701754386
45,frc4488,2076.2542185479,36,0.7222222222
46,frc1533,2072.1356193236,48,0.6458333333
47,frc3230,2071.4881103811,35,0.7142857143
48,frc2481,2068.7886898013,21,0.9523809524
49,frc494,2065.1162537725,36,0.8055555556
50,frc869,2062.6485747098,48,0.6875
51,frc230,2061.1864101023,48,0.6458333333
52,frc5279,2060.1866212576,36,0.7777777778
53,frc238,2057.8208919517,48,0.6875
54,frc5172,2055.2630353144,18,0.8888888889
55,frc2067,2054.8484971032,48,0.6875
56,frc1318,2048.8236076466,48,0.7291666667
57,frc3250,2036.8233126059,30,0.7333333333
58,frc5050,2034.4232765669,36,0.7777777778
59,frc4469,2033.0203537629,36,0.8055555556
60,frc3310,2031.833668903,20,0.95
61,frc1058,2029.9012706186,48,0.6458333333
62,frc330,2023.303386855,20,0.85
63,frc4468,2020.8796496031,36,0.7777777778
64,frc365,2015.2776518452,36,0.75
65,frc868,2013.0139180101,46,0.7173913043
66,frc1418,2004.4858845658,34,0.7352941176
67,frc2415,2002.7041652616,36,0.6944444444
68,frc16,2001.6149194784,19,0.8947368421
69,frc3688,2000.4276164977,36,0.7222222222
70,frc3990,1993.4366845031,21,0.9047619048
71,frc1747,1989.0741465466,48,0.6875
72,frc3357,1987.188575724,48,0.6875
73,frc368,1982.42880364,20,0.95
74,frc5460,1981.5616311026,36,0.6944444444
75,frc3238,1976.6527065089,36,0.7777777778
76,frc525,1976.440984525,18,0.8888888889
77,frc4384,1973.7651648588,36,0.5833333333
78,frc836,1972.003555699,34,0.7352941176
79,frc71,1970.1239091472,36,0.6944444444
80,frc56,1969.5271280769,36,0.6944444444
81,frc1731,1966.9417414622,36,0.75
82,frc1305,1964.9589066452,31,0.7096774194
83,frc1918,1961.9294395633,36,0.75
84,frc3604,1960.9763143508,36,0.7777777778
85,frc41,1959.5598450238,36,0.5833333333
86,frc2474,1957.2471714871,36,0.6666666667
87,frc4063,1956.7118381526,29,0.724137931
88,frc1425,1953.6830324716,36,0.75
89,frc1806,1951.7294052903,19,0.8947368421
90,frc2471,1950.9437092534,36,0.7222222222
91,frc3683,1947.6203673269,23,0.7826086957
92,frc1114,1944.3669483275,23,0.7826086957
93,frc5895,1941.0093857167,36,0.7222222222
94,frc3618,1941.0017737099,48,0.7083333333
95,frc3309,1934.657254773,32,0.6875
96,frc107,1931.3958797577,36,0.6944444444
97,frc1124,1921.5875314542,48,0.6666666667
98,frc4103,1919.8324175315,36,0.6944444444
99,frc3021,1917.1307148459,21,0.8571428571
100,frc85,1914.7890005898,36,0.7222222222

Full ratings can be found here: http://wesj.org/documents/elos_2016.csv

Methodology: I initialized all teams at the beginning of the season at 1500, and had ratings persist between competitions. The only matches considered by the model were qualification matches at official events. I decided to discount playoff matches because I wanted the ratings to reflect the best robots at an event, not necessarily the best alliances. Adding elimination matches massively inflated the ratings of 2nd picks on very strong alliances, often making them the third-highest rated robot at an event despite that usually not being the case. (I can post ratings with eliminations if people really want them, however)
I also used a margin of victory multiplier similar to the one used for FiveThirtyEight’s NBA Elo Ratings, which rewards underdogs for upsetting higher alliances, but for stronger alliances only rewards a little for beating weaker alliances. Most of the tuning values I used for these rankings were taken from the 538 values for the NBA, largely due to the rough similarity in scores between the NBA and Stronghold.

Here’s the script I used. I added parameters for the k-values, as well as the margin of victory multiplier function, and match level. (The ‘tba.event_get()’ method is taken from the the TBA wrapper script I use, and essentially just gets the event matches and teams from the TBA API and parses them into a dict using json.loads()).

def elos(event_key, k=20, mov=lambda elodiff, scorediff:  ((scorediff + 3) ** 0.8) / (7.5 + 0.006 * (elodiff)), level='qm'):
    event = tba.event_get(event_key)
    elos = {}
    for team in event.teams:
        elos[team['key']] = 1500
        played[team['key']] = 0

    for match in event.matches:
        if level is not None and match'comp_level'] != level: continue
        red = match'alliances']'red']
        blue = match'alliances']'blue']
        red_elo = statistics.mean(elos[team] for team in red'teams'])
        blue_elo = statistics.mean(elos[team] for team in blue'teams'])

        expected_score_red = 1. / (1 + 10 ** ((red_elo - blue_elo)/400.0))
        expected_score_blue = 1. / (1 + 10 ** ((blue_elo - red_elo) / 400.0))

        score_break = match'score_breakdown']

        red_score = red'score']
        blue_score = blue'score']
        actual_score = 0.0
        margin_mult = 1.0

        if red_score > blue_score:
            actual_score = 1.0
            margin_mult = mov(red_elo-blue_elo, red_score-blue_score)
            #margin_mult = ((red_score - blue_score + 3) ** 0.8) / (7.5 + 0.006 * (red_elo - blue_elo))
        elif red_score == blue_score:
            actual_score = 0.5
            #margin_mult = (3 ** 0.8) / (7.5 + 0.006 * (red_elo - blue_elo))
            margin_mult = mov(0, 0)
        else:
            actual_score = 0.0
            margin_mult = ((blue_score - red_score + 3) ** 0.8) / (7.5 + 0.006 * (blue_elo - red_elo))
            margin_mult = mov(blue_elo - red_elo, blue_score - red_score)

        for team in red'teams']:
            elos[team] += k * margin_mult * (actual_score - expected_score_red)
            played[team] += 1

        for team in blue'teams']:
            elos[team] += k * margin_mult * (1-actual_score - expected_score_blue)
            played[team] += 1

    return elos

Please let me know if you have any suggestions or questions about these ratings.

What exactly does ELO stand for? I’ve never heard of this statistic.

How do you account for the lack of scoring for breach and capture during quals? That’s critical to comparisons for elims.

Are you talking about the missed 25 and 20 point bonuses in the eliminations by not including the elims? Thanks.

It’s a name not an acronym:

Fivethirtyeight.com uses it for a lot of there analyses and I’m growing to like it as a ranking method…

It’s actually Elo rating, named after its creator, Arpad Elo.

As stated in the OP, elims results are not accounted for in the model.

It doesn’t account for elimination matches, but I’ve thought about adding the point bonuses for quals matches. I need to write some sort of prediction accuracy code to see if it’s any more predictive.

One source of error in this system arises because non-district teams play fewer qualifying matches than district teams. More capable teams, such as 16, 254, 330, 971, 2481, etc. would need a few more matches for their Elo ratings to converge from the initial seed (1500) toward a figure that better represents their performance.

This is a major flaw to the Elo system (at least when applied to FRC), and it’s (as well as general normalizing between events) been something I’ve been trying to figure out for awhile. I was thinking about doing a straightforward multiplier to amount of Elo added for each match depending on the type of the event (i.e. District Qualifiers would add half as much Elo per match as a regional / DCMP would), but that would still massively favor teams that go to 3 regionals or 3+ district events. I also thought about normalizing the amount of Elo added per match by how many matches the team has played previously, but that would then disproportionately favor the teams that go to a single 8-play regional.

What about running the Elo twice, using first pass results to seed the second pass?

I have been developing my own Elo rating system for FRC over the past few years. The way it works differs from the one in this post enough that I thought it might by interesting to compare.

The data in my ratings are based on the history of each team dating back to 2002. At the end of each season, ratings are truncated 80% closer to the starting rating. Since 0 is the starting rating, a team with a score of 100 at the end of one season will begin the next season at 80. As this system uses matches from different games, it does not use win margins. A “K factor” of 32 is used in each match except in playoff matches, where the K factor is 16 (I too found that playoff matches seemed less predictive of future matches than qualification matches).

At the 2014 FIRST Championship, this system had a 0.190 Brier score, so it at least performs better than flipping a coin!

Anyways, enough methodology. Here are the current rankings:


Rank | Team   |  Rating
   1. frc254            448
   2. frc1519           421
   3. frc225            388
   4. frc1678           381
   5. frc195            373
   6. frc2481           368
   7. frc118            364
   8. frc987            345
   9. frc2056           344
  10. frc359            340
  11. frc1023           336
  12. frc1241           333
  13. frc1114           333
  14. frc525            329
  15. frc148            327
  16. frc1986           325
  17. frc330            323
  18. frc67             321
  19. frc133            312
  20. frc2767           310
  21. frc33             306
  22. frc3310           300
  23. frc16             299
  24. frc1806           299
  25. frc4564           295
  26. frc1501           292
  27. frc125            290
  28. frc3238           284
  29. frc5172           278
  30. frc971            278
  31. frc368            276
  32. frc494            275
  33. frc5254           270
  34. frc2122           269
  35. frc2771           268
  36. frc3130           267
  37. frc1619           266
  38. frc1024           266
  39. frc4334           264
  40. frc4469           263
  41. frc3683           263
  42. frc2974           262
  43. frc179            259
  44. frc3314           259
  45. frc27             259
  46. frc3230           256
  47. frc1540           254
  48. frc1318           254
  49. frc1983           254
  50. frc3990           253
  51. frc3339           253
  52. frc126            249
  53. frc2451           247
  54. frc107            245
  55. frc1730           245
  56. frc3604           244
  57. frc2067           243
  58. frc3824           243
  59. frc2468           243
  60. frc233            242
  61. frc2590           242
  62. frc4103           239
  63. frc180            238
  64. frc1918           237
  65. frc973            236
  66. frc1418           236
  67. frc4488           236
  68. frc25             236
  69. frc3688           235
  70. frc5050           235
  71. frc70             234
  72. frc2883           232
  73. frc1261           232
  74. frc2848           232
  75. frc341            229
  76. frc1296           228
  77. frc1746           227
  78. frc177            227
  79. frc868            226
  80. frc4039           226
  81. frc3937           226
  82. frc744            225
  83. frc1717           222
  84. frc2614           222
  85. frc4188           221
  86. frc85             220
  87. frc2137           220
  88. frc3309           219
  89. frc217            219
  90. frc1065           219
  91. frc1425           218
  92. frc4967           218
  93. frc836            218
  94. frc1126           217
  95. frc1836           216
  96. frc2471           216
  97. frc3255           215
  98. frc2338           215
  99. frc231            214
 100. frc4003           214

I will try to post the source code and some of the research I used to develop this system at some point after finals.

EDIT: Here are the full rankings if anyone is interested.

Someone pointed out an error in the script I posted, which made the ratings slightly incorrect. I fixed the error and updated the rankings in the OP.

As for predictive power, this set of rankings had a Brier score of 0.155.

I can see at least one reason that “past performance does not guarantee future results”, as the investment prospectus always says.