View Full Version : How many matches are really needed to determine final rankings?
It's funny how it works out that the top teams tend to end up in the top 8 spots by the end of qualifications. Maybe magical. Early in my involvement with FIRST someone said that there is a mathematical theory behind it. So, I assumed teams change a lot in rank early in the competition; but by the end the amount of change for all teams should level off, indicating they are sorted well. But the theory was never explained, and I could never test it. Until now. Based on a semi-random sample of eight regional competitions from the 2015 season, this post shows the extent to which teams really are sorted and draws a few conclusions about what it means for scouting.
For the last 5+ years, teams were ranked by their qualification score and quite often secondary and tertiary terms. Those extra terms were never available in the match results posted online, making it very difficult to determine how teams change in their rank during the course of the competition. This year, however, since the qualification score is the team's average score, it's very rare to need tie-breakers after the 2nd round. Yeah!
I started with the 10,000 Lakes and North Star Regionals since that is where I am located. But I expanded the analysis to include Colorado, Ventura (since they have 12 matches), Silicon Valley (since I heard the average qualification score was high), Wisconsin (close to MN), UNH District (12 matches), and New York City (many teams, only 8 matches).
The attached spreadsheet (18797) has all the analysis in it plus some extra plots. For your convenience, I'll post a few charts here.
18792
This chart shows how each team's rank changed over the course of qualifications at the North Star Regional. The number and extent of changes in rankings between match 9 and 10 are quite a bit more than I expected. However, what is interesting is that the changes in ranking for the top 10 teams or so has leveled off. The next chart removes some of the clutter and shows the path of the top 10 teams.
18793
These teams follow the path I was expecting: lots of change early on and relatively little change by the end of their matches. So, from this graph (and the graph for the Ventura regional), it seems safe to say that the top 8 really are the top 8, sorted in that order.
But, unfortunately, this is not the case with all of the regionals. Let's look at the Wisconsin Regional.
18794
Even though the top nine teams have been identified, it looks like there's still potential for change if they played more matches. So what happens if there are more matches?
The University of New Hampshire District Event and Ventura Reginonals both had 12 matches. One would hope that after 9 or 10 matches, they would be stable; but that is not necessarily the case.
18795
At the UNH District event, top 8 teams were changing by as many as 3 places between the 11th and 12th matches.
Another way to look at the data is to ask how early the top 3 or 5 or 8 could be identified. They might not be sorted, but they are at least in the top spots (see 'TopRankAnalysis' sheet).
The top one or two teams for nearly every regional were identified right away, by the second or third match. Taking all sampled regionals into consideration, one could very safely say the top team is identified by round 6; and it could be safe to say the top three are identified by round 8. With some risk you could say by match 5, the top three teams will stay there.
So, how many matches are really needed to determine final rankings? Apparently many many more than just 9 or 10 if we want every team to be sorted. However, since it is the top 8 who are picking their alliance partners, I think it is safe to say those teams, at least, have been identified by the end of qualifications. More matches could be helpful to further sort those top 8. But I also think most regionals run into time constraints. Those Fridays and Saturdays are long days.
I think another outcome, especially for those top 8 when they are choosing their alliance partners, is that the top 8 should not base their picks based on how teams are sorted beyond the top 10. Scouts really do need to look at how many points each robot is able to contribute to the final score and not assume teams are completely sorted. A team ranked 11 isn't necessarily better than a team ranked 15, for example.
And now that I've run out of time, one other musing I have is the effect of top teams. The Cheesey Poofs, for example, had an average score of 200, while most teams averaged about 65 points. Since the Cheesey Poofs' totes count towards their alliance partners' scores, how did that effect the rankings of their alliance partners? Perhaps another day. The spreadsheet also has histograms showing how many teams averaged between 50-60, 60-70, 70-80 points, and so on at each competition. Based on the peaks of the curves, one can tell fairly quickly which regionals are more competitive than others. Perhaps this, too, can be elaborated on another day.
Jon Stratis
08-04-2015, 15:50
The difference you see in events really had to do with the relative strength of each event. If you have several top teams who all score about the same, you'll see them swapping locations throughout the event. However, I'd there's a clear drop off, then things well become more stable. The variability you see at North Star among everyone outside of the top 10 really has to do with how lopsided things were with a couple of teams. If you had a match with Wave or the ERRORS, you jumped up a bunch of spots, then started sliding back down towards where you were. I saw one team jump something like 40 spots due to one match (their 9th or 10th, I think). If you were unfortunate enough not to have any matches with them, you never got that boost. Graphing it out like this lets you actually see which teams benefited from the random pairing, which is pretty cool.
zinthorne
08-04-2015, 16:08
I think this thread brings up some very valid points. I do think this years ranking format separates the true best 8 better than previous years. As said above 254 rose to number #1 because they were the best. The amount of matches that a competition has also plays a bigger role than most would think. For Example in our Shorewood district event (12 qual matches) we had some mechanical and did not field our robot for our first 2 matches. We were ranked 27 out of i believe 31 or so. By the end of quals we were ranked #2 and only 2 average points behind first place.
The other thing about this game is the fact that your qual average varies alot on who your alliance partners are. You could have a middle of the pack team that has 8 matches with top 8 team, as opposed to a team with the same ability that only has 2 matches with top 8 teams. The team with fewer quality alliance partners will rank lower. This shows how valuable scouting is in finding the truly best teams.
Example: In PNW championship this last weekend we were the #6 alliance captain. We picked the 21st seeded team and the 45th seeded teams out of 60. Our alliance was able to make it to semi finals and barely miss finals by .67 of a point with an average of 201.33. This shows that the best teams are not always ranked near the top, and it is scouting which will find them.
The other hard part about this years ranking is the coopertition points. These are a very easy way to raise your ranking or drop it significantly. It also allows for coop specialists to rank very high when they may not be the most suited for playoffs where there is no coop. In my opinion when scouting coop points should be taken out of the amount of points a team scores in a match, because it will usually separate the best teams from the average teams. I know that FRC GAMESENSE had a few graphs set up where they took several competitions and factored out coop points from everybody's rankings and the results were sometimes very drastic. Usually teams shifted a few spots, but sometimes it was the difference between top 5 and out of the top 8.
Great analysis!
To expand a little on Jon's comments I see 2 main sources of QA variance throughout a competition:
(in)consistency of a team's own scoring
variation due to partner contribution, i.e. schedule effects
Great teams are both consistent and (since they score most of the points) are less subject to alliance partners/schedule. So they should sort out fairly quickly and be mostly stable.
District-sized events also have less schedule variance since teams are allied with the majority of other teams throughout the event.
Early events would be expected to have higher inconsistency as teams are still learning the game.
It would be interesting to see if there is less variability in the later district events where teams or on their 2nd or later event.
Citrus Dad
08-04-2015, 17:19
Interesting analysis.
I started with the 10,000 Lakes and North Star Regionals since that is where I am located. But I expanded the analysis to include Colorado, Ventura (since they have 12 matches), Silicon Valley (since I heard the average qualification score was high), Wisconsin (close to MN), UNH District (12 matches), and New York City (many teams, only 8 matches).
I wonder how different the analysis would look if you looked at partial OPRs instead of partial average scores.
I put together partial OPRs for each of the 8 events you listed above, in case someone is interested enough to plot them or otherwise analyze/summarize them.
If this looks interesting/promising, I will generate partial OPRs for all 106 events thus far (Weeks 1 through 6).
Column A is team number, Column B is final OPR (after all Qual matches), Column C is partial OPR after all Qual matches less one, etc
Some of these events have surrogates and some have DQs. I ignored this information and included those scores in the OPR computations.
dougwilliams
08-04-2015, 21:35
...
If this looks interesting/promising, I will generate partial OPRs for all 106 events thus far (Weeks 1 through 6).
...
It does look interesting and I'd be interested in seeing that. I'd also be interested in seeing what code/spreadsheet magic both you, and the original poster are using to process that match data.
I put together something the other day that calculated my team's average after every Qual match through the VA regional. The above graphs are updated after each match round. I only needed something quick, and that seemed to be quicker, but probably not following the official scoring methodology.
Dominick Ferone
08-04-2015, 23:16
I believe that especially with this years game ranking can jump a lot. I know at the Tech Valley regional we "lost" a lot of matches with our score being lower then the opponent that match. But in the end it doesn't matter just of who has the best bot but who plays the alliance they have the smartest and most effective. Three of the lower ranked bots can succeed if played correctly and this year it seemed if teams figured out what worked best for them from the get go and kept to their strategy they would usually have a lot of success.
partial Final-Score OPRs for all 106 events weeks 1-6 can be found here:
http://www.chiefdelphi.com/media/papers/3125
Conor Ryan
09-04-2015, 15:22
I love this type of research, it really helps to tune those algorithmic scouting applications.
This type of work is incredibly similar to mathematical economics/econometrics type fields.
Fundamentally you'll see the law of averages dictate direction, but the real question is what was the Quality of the Other Alliance Members for each team based on the final ranking. The Strength of Schedule should be a good proxy for determining the quality of the match scheduling, which should help you determine the minimum number of matches needed to rank better.
But hey, we all have those days where we jump from 48th to 6th in the last 4 matches.
Fun comments. Thanks!
I decided to go ahead and look at the top seed alliance effect. That is, how does a team's rank change depending on if it is with or against the top seed team during a match, or not in a match at all with the top seed team? To do the analysis, I essentially took out every match that the top seed was in; marked which teams were with, against, or not in a match with the top seed; and then looked at how the new rankings changed versus the official rankings.
See below for the Silicon Valley (SV) and North Star (NS) Regionals. I took out team 254 from the SV Regional and 2826 from the NS Regional.
18806 18807
In the SV Regional, a team's overall rank increased by about 6.55+/-5.81 places if it was in an alliance with 254. But, a team's overall rank decreased by 5.87+/-3.49 places if the team did not have a match with 254.
In the NS Regional, a team's overall rank increased by about 8.05+/-8.68 places if it was in an alliance with 2826. But, a team's overall rank decreased by 5.79+/-2.58 places if the team did not have a match with 2826.
Ya, I would say this confirms the hypothesis that being in an alliance with the top team is going to boost a team's overall ranking. Likewise, not being in a match with the top team does not help.
Ether and Doug, I'll look into making some plots tomorrow with the OPR, including attaching the code for getting the data (just a heads up, it's not totally automated). But, ya, could be interesting.
It does look interesting and I'd be interested in seeing that. I'd also be interested in seeing what code/spreadsheet magic both you, and the original poster are using to process that match data.
See attached for the code (18813) and spreadsheet (18812) for computing match data. To run it yourself, you'd need to import the code file into an Excel macro and save the Excel file as macro-enabled. Then set up the Match Results and Team Scores spreadsheets as described in the comments (you can also see other sheets in the Excel file for examples of how the data looks and how they are named). Then run the macro, and it should import team scores for each match. It doesn't get matches if the team was disqualified, unfortunately; so those will need to be updated manually. Then, compute average scores and rankings with Excel's average() and sorting functions. It is a bit of work by hand; but once one gets the hang of it, I think it goes pretty quickly. I hope this helps, and let me know if you have any questions.
I wonder how different the analysis would look if you looked at partial OPRs instead of partial average scores.
I put together partial OPRs for each of the 8 events you listed above, in case someone is interested enough to plot them or otherwise analyze/summarize them.
...
Column A is team number, Column B is final OPR (after all Qual matches), Column C is partial OPR after all Qual matches less one, etc
I created a couple plots based on the OPR data you provided. It looks like each column represents a new match and not a round. For example, Silicon Valley has 95 matches but 10 rounds. If that is the case, the data has only the last 20 matches. It would be easier to compare if the OPR was computed at the end of each round. But, the OPR graphs are still a little interesting. First, let's look at North Star.
18814
The OPR for the final 20 matches seems to be fairly constant for teams. It looks like whenever a team has a match, there's a jump. And then in between matches, it might go back towards the mean. Beyond the top 10 teams, it gets very cluttered. That is, there aren't many OPR points distinguishing teams. So, for the Silicon Valley Regional, I took out all but the top 10 teams.
18815
This makes it a little easier to see how the team's OPR changes between matches.
I didn't dive too much into correlating the OPR with the ranking because of the different domains. But, it appears that the OPR is better able to account for the top seed effect, as I like to call it.
Speaking of which, I wanted to revise my analysis from a couple posts ago, where I looked at how the rankings would change if the top team wasn't at the competition. The three categories are not mutually exclusive. The With status could include teams that also have matches against the top seed. Likewise, the Against status could include teams that also have matches with the top seed. So, I filtered the results a little differently to look at Only With, With And Against, Only Against, and Neither. See below for the North Star and Silicon Valley Regionals.
18816 18817
I also computed a few statistics using the student t-test. For both the NS and SV Regionals, if a team was either against or not at all in a match with the top seed, that team ended up having a lower rank (p<0.001). On the flip side, if a team was with the top seed or with and against it, then it did have a higher rank (p=0.007 for NS With And Against, p<0.001 for all other cases).
So, the conclusion is the same - a team does better if it's with the top seed and worse if it's against or not with the top seed - but I think this method proves the point better.
Cheerio.
Silicon Valley has 95 matches but 10 rounds. If that is the case, the data has only the last 20 matches.
No, the data uses all 95 matches, then the first 94 matches, then the first 93 matches, .... etc:
Column A is team number, Column B is final OPR (after all Qual matches), Column C is partial OPR after all Qual matches less one, etc
Column B is the OPR using all 95 matches
Column C is the OPR using the first 94 matches
Column D is the OPR using the first 93 matches
.
.
.
Column U is the OPR using the first 76 matches
... so the progression from Column U to Column B shows how the OPR changed over the course of the last 20 matches.
Partial OPRs for all 106 events in Weeks 1 through 6 are posted here:
http://www.chiefdelphi.com/media/papers/3125
... so the progression from Column U to Column B shows how the OPR changed over the course of the last 20 matches.
Okay, I see. Thank you for clarifying. Is it at all possible to get the OPR for all matches, not just the last 20? I understand that the algorithm needs every team to compete at least once before the OPR can be computed, which means I don't expect the OPR for the first 10 or so matches. But, ya, it could be interesting to see a bigger picture of how the OPR changes over the course of the competition. Cheers.
Is it at all possible to get the OPR for all matches, not just the last 20?
Attached is a ZIP file containing partial OPRs for all 109 events in Weeks 1 through 7. Would you like fries with that?
Just kidding.
Let M be the number of qual matches at an event, and T be the number of teams.
The analysis proceeds as follows for each event:
for (k=M; k>T/2;k--){computeOPR(); deleteMostRecentMatch();}
PS - forgot to mention: I can transpose the rows and columns if that would make it easier for you to do your plotting.
Oooooohhhh.... preettttyyyy.... :)
Had time for just a couple plots. But these really show what I was expecting: lots of movement/changing early on and then a leveling-out. However, the leveling-out really isn't that level. It seems like there are still jumps after teams have matches later in qualifications. Perhaps because of good alliance partners, or perhaps the team had a good match.
The first chart has all teams plotted from the Silicon Valley Regional, and the second plot has just the top ten OPR teams from the North Star Regional.
18837 18836
Thank you for all the number crunching to get all the OPR scores, Ether. Perhaps later there will be time for more analysis with reference to rankings instead of just graphical. Or, has this analysis gone on long enough? What else would be interesting?
Alex2614
14-04-2015, 03:22
What else would be interesting?
I'd like to see an analysis of how much different things would be if the old W-L-T structure was still there. Because I know that my team would not have been in the top 10 at our regionals if the old structure was still in place.
Some say this structure is harder to win because you have to out score every team, not just your opponents. But I think it really does ensure that the best teams come out on top, if at the expense of a less exciting rank-watching time during the events.
Thank you for all the number crunching to get all the OPR scores, Ether.
You're quite welcome.
The actual elapsed time to get from raw data to those reports for all 109 events (6398 partial OPR reports) was only 17 seconds on a single core of a Pentium D in an 8-year-old machine running XP Pro SP3, using AWK to wrangle the data and Octave to crunch the linear algebra.
I'd like to see an analysis of how much different things would be if the old W-L-T structure was still there. Because I know that my team would not have been in the top 10 at our regionals if the old structure was still in place.
Which regional were you at? Also, what should be used as the 2nd order tie-breaker? Since average score is handy, I think I would prefer to use that. If not that, are coopertition and auto points available for each match? Do you know where one would go to find them?
The actual elapsed time to get from raw data to those reports for all 109 events (6398 partial OPR reports) was only 17 seconds on a single core of a Pentium D in an 8-year-old machine running XP Pro SP3, using AWK to wrangle the data and Octave to crunch the linear algebra.
Nice!
vBulletin® v3.6.4, Copyright ©2000-2017, Jelsoft Enterprises Ltd.