I averaged the qualification and eliminations scores of all events played up until this point (end of week 4) to get a relative idea of the strength of all the currently existing districts. Why? I had time between classes and wanted to confirm my biases of course.
This method obviously has issues. For one it doesn’t take into account how regions have different levels of defense (I’ve heard rumors about Indiana). It also just gives an overview of how strong the average team is in a given district, and does not account for the influence of powerhouse teams other than their contribution to the average. A region like FIT for example has quite a few powerhouse teams, but their scores are fairly low. Also, since not all events have been played yet, some “big” events in regions are not accounted for yet. For example, Bensalem and Montgomery are slated to be FMA’s strongest events but haven’t been played yet. Overall this was just a quick look at the average scores and probably not the best metric.
I’m kind of surprised by how close some of the scores were. Many regions were within 1 point of each other. I’m not sure if they elims or quals is a better metric. I’m leaning towards quals because it is every team at the event, while elims eliminates the bottom teams. I’ll keep this updated as the season progresses and might add some possible regions such as NY, Cali, and WOW if I get bored.
Would it make sense to compare some additional metrics beyond simply average scores? Such as rocket completion rates, habitat climb rates, or habitat RP rates?
Probably, I figured average scores would give the best overview. A district may have a lot of HAB 3 climbers but not a lot of good cyclers. I’ll take a look later.
I know for CHS that defense has been not fully utilized by many teams and could be the reason for why they have such a high elim’s score. There have been several very powerful triple scoring alliances in that region and with such small event sizes make the second pick harder to use on defense than using them for a few extra cycles.
I’m curious what you mean? This isn’t my subjective opinion or an in-depth analysis of each region. I literally just ordered districts by their average match score.
Honestly, I agree that it’s flawed. I looked at it because I wanted to ignore the top tier bots. We often hear how strong Texas is simply because it has a few powerhouse teams. But looking at their match scores its pretty clear that outside of a select few teams their events are weaker than many other districts. What’s more important when judging relative strength, the average teams or the best teams? I just thought it would be interesting, and for the most part the scores were so close that you can’t really make a judgement off of them anyway.
Should you include some sort of normalization depending on which weeks the events happened in? Overall match scores have risen almost 10 points over the season and I’m sure this might skew some of the averages for districts that are spread to later weeks of the season.
For the kind of quick-and-dirty analysis we’re doing here I think this is a pretty good idea; it has a lot of value relative to the annoyance of doing it.
Perhaps consider each event’s contribution to be: (eventAverage / weekAverage) and then average these contributions. The result will be less immediately interpretable but help the weighting.
I have a feeling this is a decent measure of what regions have less bad robots. Robots that can’t score drag down these numbers much more than powerhouse teams inflate them.
Might be worth looking at it by point differential, as it wouldn’t penalized districts that play defense/reward those that don’t. A strong district would have closer scores than a weak one.
Also might be interesting to track foul points and see what district breaks the rules the most.
Fouls would be a tough metric to measure, due to the variance of the way different refs call pinning. Eg. I have seen pin timers stop as soon as the robot is no longer touch and I have seen pinning penalties happen when the robot had backed up 10 feet. These are obviously fringe cases but even region does have some variance in Reffing.
Something unrelated I noticed when actually looking at the spreadsheet:
The average of the average score of each district event is not the average score of each match played in the district. It seems to me to be unlikely to make a huge difference in the results, but be aware precisely what you’re looking at. You can retrieve the average of each match in the district by weighting each event by the number of matches played.
Edit: added a .py file with functions to do those calculations, for the interested.
Update: to answer @Lil_Lavery’s question, I also wrote this set of python functions. This allows you to define your own objective function. I also provide some handy objective functions: average score, average score minus fouls, habRP docking rate, hab3 climb rate, and rocket completion rate.
You’ll have to include your own TBA auth key (this is easily gotten on your account page on TBA), and install tbapy. I’ll also attach an excel file with the results for each of the objective functions I’ve defined if you don’t want to do that. If you’re curious about some other statistic feel free to ask about it.
Just skimming this thread in passing, but wouldn’t an easy(er) metric to give more robust ranking be: ranking by the average score of the losing alliance in elms? I don’t care how good your district is, if there are only ever 2 good teams at any given event and they pair up to make every match a blowout then your district is weak.
Tbh variance in scores is probably the best tell of district strength (that is easily computed & agreed upon). Higher scores with low variance = awesome. I’ll leave it up to the hivemind to figure out the weights on scores vs variance. (You can ignore my rage inducing statement in the paragraph above )
Would anyone be willing to run the ANOVA on average elim scores by district? I have a hunch that not a whole lot is significant. Nvm, I did it, not a whole lot in terms of significance here. But we do have some very small samples in here…
(alpha = 0.05)
> dstStrWk4<-read.csv("DistStr_wk4_2019.csv")
> aov(formula = Elims ~ Dist, data = dstStrWk4)
Call:
aov(formula = Elims ~ Dist, data = dstStrWk4)
Terms:
Dist Residuals
Sum of Squares 837.8824 1387.8381
Deg. of Freedom 10 60
Residual standard error: 4.809432
Estimated effects may be unbalanced
> pairwise.t.test(x = dstStrWk4$Elims, g = dstStrWk4$Dist, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: dstStrWk4$Elims and dstStrWk4$Dist
CHS FIM FIT FMA FNC IN ISR NE ON PCH
FIM 1.000 - - - - - - - - -
FIT 1.000 0.212 - - - - - - - -
FMA 1.000 1.000 1.000 - - - - - - -
FNC 1.000 1.000 1.000 1.000 - - - - - -
IN 1.000 1.000 0.392 1.000 0.744 - - - - -
ISR 0.495 0.130 1.000 0.751 1.000 0.165 - - - -
NE 1.000 1.000 0.670 1.000 1.000 1.000 0.295 - - -
ON 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 - -
PCH 0.192 0.037 1.000 0.290 1.000 0.075 1.000 0.100 1.000 -
PNW 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
P value adjustment method: bonferroni
… and for those curious about quals…
> aov(formula = Quals ~ Dist, data = dstStrWk4)
Call:
aov(formula = Quals ~ Dist, data = dstStrWk4)
Terms:
Dist Residuals
Sum of Squares 723.1389 2225.0262
Deg. of Freedom 10 60
Residual standard error: 6.089645
Estimated effects may be unbalanced
> pairwise.t.test(x = dstStrWk4$Quals, g = dstStrWk4$Dist, p.adjust.method = "bonferroni")
Pairwise comparisons using t tests with pooled SD
data: dstStrWk4$Quals and dstStrWk4$Dist
CHS FIM FIT FMA FNC IN ISR NE ON PCH
FIM 1.00 - - - - - - - - -
FIT 1.00 1.00 - - - - - - - -
FMA 1.00 1.00 0.40 - - - - - - -
FNC 1.00 1.00 1.00 1.00 - - - - - -
IN 1.00 1.00 0.54 1.00 1.00 - - - - -
ISR 1.00 1.00 1.00 1.00 1.00 1.00 - - - -
NE 1.00 1.00 0.74 1.00 1.00 1.00 1.00 - - -
ON 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 - -
PCH 1.00 1.00 1.00 1.00 1.00 0.97 1.00 1.00 1.00 -
PNW 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
P value adjustment method: bonferroni