Elo Comparison of Regions

A couple months ago, someone asked me why I felt that Elo was a worthwhile metric to compare teams that existed in isolated regions. I did some quick calculations then to prove my point, but I wanted to look into this question more. Intuitively, we all know some regions are stronger than others (and we all think our region is under-rated), but does Elo run into regional biasing problems like this?

The best method I could think of to answer this question would be to compare the Elo ratings of every team attending championships before and after they have competed on the international stage. Since championships are by a large margin the most diverse events we have, this seemed like a good place to find out if teams from some regions consistently over or under perform their Elo expectations. Since Elo is a zero-sum algorithm (during the season at least), we would expect to consistently see the Elos of teams from heavily under-rated regions increase and the Elos of teams from heavily over-rated regions to decrease at the championship event.

I have attached the results of this endeavor to this post. It contains the raw data, a summary of Elo ratings of every region from 2008-2017, and a condensed summary that looks at the year 2017 in isolation as well as the period 2012-2017. I chose the period 2012-2017 because I wanted to give the Elo model a few years to account for regional differences on its own.

If you don’t want to look at the document, here is the summary:
The only regions that were significantly (p < 0.05) over-rated according to Elo in 2017 were RI (p = 0.024) and OK (p = 0.035). Considering I was testing 61 different regions, and none of the regions had p values less than 1/61, we are severely lacking good evidence that Elo was consistently over-rating regions in 2017 alone.

No regions were found to be significantly under-rated according to Elo in 2017.

For the period 2012-2017, 8 regions were found to be significantly over-rated by Elo. Those regions were:
region (average Elo change at championships)
RI (-55)
OK (-24)
NY (-10)
Brazil (-41)
Mexico (-16)
TX (-9)
MO (-11)
LA (-23)
For reference, a 7 point Elo change corresponds to an increased likelihood of winning an otherwise even match by about 1%.

Likewise for 2012-2017, 4 regions were found to be significantly under-rated by Elo. Those regions were:
MI (14)
CT (15)
SD (81)
ON (9)

Here is my take on the results:
The most extreme Elo changes for 2012-2017 came from regions which only sent a single digit number of teams during that period, RI with 8, Brazil with 9, and SD with 2. Even though these regions did have significant drops, they were sending less than two teams to championships each year during this time, so their regions’ large Elo changes could very likely be explained by other factors.
Excepting those 3, the largest consistent Elo change we see in any region during this period comes from OK at -24. This means that, in an otherwise even match, we would expect Elo to overestimate an OK team’s win likelihood by about 3%. As another reference point, one of your climbs in 2017 was worth about 11 Elo points to every team on your alliance and -11 Elo points to every team on the opponent alliance.
For me, the regional differences found in this endeavor were not big surprises to me, and if anything, were small enough that they have given me more confidence in Elo as a rating system. Any major biases in Elo will self-correct over time. The only region that I would feel comfortable predicting future under-rating for by Elo would be MI, considering they have been under-rated since 2009, they have a large sample size to work with, their rookie growth for the past few years has been abnormal relative to the rest of FRC, and they will likely continue to be one of the most heavily isolated regions into the near future.

Also, MN just barely wasn’t significantly over-rated for the period 2012-2017 (p = 0.057), so that makes me happy sort of. :confused:

Elo region comparison.xlsx (1.52 MB)

Elo region comparison.xlsx (1.52 MB)

1 Like

Is there any way you can do this by district? Seems a little weird separating states in the same district. (I may have missed the line in excel where they have NE and MAR, but if not can you add them?)

Here is another version which groups some regions together. Here are the groupings:

My data set doesn’t tell me which PA teams are in and out of MAR by year, so I combined NJ and DE in one row and NJ, DE, and PA in another. The only combined region that has an Elo change for 2012-2017 significantly different than chance is Canada.

Elo region comparisonv2.xlsx (1.53 MB)

Elo region comparisonv2.xlsx (1.53 MB)

1 Like

For us N00bs - ELO? What is this a measure of? How is it calculated?


Attached are the 2017 regional Elo averages. Only regions which had at least 10 competing teams in 2017 were used. The first column is the region’s raw average. The second column is the raw average adjusted up or down if the region was significantly under or over rated during 2012-2017 as defined above. The third column is the region’s raw average adjusted up or down for all regions. The amount each region is adjusted is 2X the amount that their teams changed on average during championships from 2012-2017.

2X is my rough attempt to correct for regional biases. This comes from the fact that when I intentionally bias teams, my Elo model seems to eliminate about 5% of this bias per quals match. Over 13 quals matches, a consistent 5% drop in bias means that about 50% ((1-0.05)^13) of the bias is eliminated. Alternatively, you could say biases have a half-life of 13 quals matches in my Elo system. Since champs have ~13 quals matches, I took the Elo change at champs for each region and multiplied it by 2 to get an estimate of the total Elo bias of each region.

Obviously this isn’t perfect methodology since I did a fair bit of approximating. I think a legitimate case could be made for anywhere between 1X and 4X adjustment.

Elo averages.xlsx (23.5 KB)

Elo averages.xlsx (23.5 KB)

Interesting data collection. I’d like to see FIRST possibly use ELO for matches/rankings at the district level to see what it does to the top teams, and how the district points change.

I’m a big fan of using ELO to see how teams change over time, similarly to how fivethirtyeight uses it in their sports predictions: The Complete History Of MLB | FiveThirtyEight

But with the low sample size of FRC seasons, I don’t know that I like using it as a metric within one season. I think it could be an excellent way to compare teams over time, similarly to how Jim Zondag uses championships results to compare teams over time here: https://www.chiefdelphi.com/media/papers/3379

(Now if every team played as many matches as 125 does, ELO would be great :confused: )