So I did a really quick analysis of strength of schedule on the Milford event. I have no idea how valid this is but it gave me a good sense check, it assumes rank is a good indicator of likelihood of winning, which is kind of using the result in the analysis, but you can tell me why this analysis is flawed (I didn’t feel like busting out Python for this). I took the ranking of each team in a match at the end of qualification and added up each alliance. The alliance with the lower total won 64 out of the 78 matches. So going off this, I then looked at every match for an individual team and subtracted their rank from the delta between alliances which provides the required rank they’d need to achieve to be able to win the match, totaling these gave an ease of schedule indicator (lower numbers mean it was harder). This method shows 7220 had the 16th hardest schedule at the event, 67 had the 9th hardest and ranked 1st, 453 had the hardest and ranked 39th, 7769 had the easiest and ranked 13th.
Long story short. The match schedule wasn’t your problem, you had a near average schedule. Maybe someone with more time / ability to code quickly can do an ELO strength of schedule analysis (I’d be interested to see how that compares).
Based on your other posts and where you compete, I assume one of the powerhouse teams you are referring to is 67. Since I was heavily involved in scouting on 67 from '14-'19 (I was not involved with the team much in 2020) I’d be happy to address any misconceptions you have about how big teams scout and/or do alliance selection (although I don’t want this thread to get off topic).
I feel like scheduling fairness comes up a few times every year, I’m going to try to make this my summary post that I can link to whenever these come up.
In my mind there are three major steps I would like to explore that could potentially be taken to improve the current scheduling system. There are others, but these are the ones that intrigue me the most. Here they are sorted from least to most radical departures from the current system: Pre-generating anonymized qual schedules, then only slotting teams into them as a last step:
I am a big proponent of this method for a variety of reasons, and I think FRC should implement this immediately.
Timebomb had a good reason here:
Saves time and effort for schedule generation. With this system schedule generation becomes nearly instantaneous instead of needlessly creating thousands upon thousands of bad schedules at every event that are discarded. If a team drops at the last minute, it takes seconds instead of dozens of minutes to make a replacement schedule.
It is a known and tested system through Cheesy Arena. Other suggestions for scheduling changes are hypothetical and untested at this point, but Cheesy Arena has implemented pre-generated schedules since 2014 to great effect and they have been tested at a dozen competitions. Changing to this format is not some pie-in-the-sky hypothetical, it is a real, genuine improvement that has had thorough testing.
Vastly increases transparency. Anyone could independently verify that these schedules are generated properly. Based on my trust in the organization, I personally do not believe there exists any bias in FIRST’s schedules, but can I actually really prove this? Well maybe for a single event, but on the whole? Not really, no, since I have no clue what the next schedule will look like. Compare that to Cheesy Arena, I have a comparable level of faith in the Cheesy Arena devs to make balanced schedules, but since they pre-publish all of their schedules, anyone can independently verify that there is nothing sketchy about them (and I have). As long as cheesy arena uses those schedules, I can confidently say that there are no major errors, whether that was at the 2014 chezy champs or the 2034 chezy champs. Every year there are people that think the schedule is biased against them, I would absolutely love to tell them definitively they are wrong, but I can’t effectively do that right now since the scheduler is a black box. There’s no reason for it to be this way though, we have a better path with pre-generated schedules.
Provides a much better backbone to the scheduling system generally to potentially allow for future additional improvements. The current scheduler already has enough criteria, and requires thousands of iterations to make good usable schedules. Trying to add additional scheduling criteria (e.g. team strength balancing) would overtax this system and cause sacrifices to existing important schedule criteria like match turnaround times. With pre-generated schedules though, you can feasibly super-impose additional criteria on existing schedules (like I did with strength balancing, see below), or add criteria and run the scheduler 100 million times to absolutely assure nothing is being lost.
One unjustified argument I’ve seen in favor of pre-generated schedules is that they are “better” schedules in the sense of causing less surrogates, less duplicate partners, less duplicate opponents, higher match turnarounds, and more red-blue balance. I found no large distinction between the cheesy arena schedules and the IdleLoop schedules in these regards, which leads me to believe that with either system we are basically at the limit of how “good” our schedules can get with respect to those criteria. There are plenty of other reasons to support pre-generated schedules though.
Schedule balancing for team strength:
This one is dependent on pre-generated schedules, I think trying to add team strength balancing without implementing those first can (and has) lead to poor outcomes. I wrote up a three-part article series on my approach to solving this problem using the Chezy Arena pre-generated schedules. Would have likely tested these strength-balanced schedules out at a 2019 off-season if I hadn’t been switching jobs at the time. Someday after corona we’ll likely get to see these in action.
Note that “schedule balancing” as I use the term is not at all the same as “match balancing”. By “schedule balancing” I mean that, throughout the course of an event, every team will encounter on average equally skilled partners and opponents. This criteria does not care at all if any given match will have roughly evenly skilled partners and opponents. This kind of “match balancing” was one of many problems associated with the 2007 algorithm of death and I have no interest in repeating anything like that.
Notable downside to this is that it requires an ordered list of teams by some measure of “strength” going into the event. This is difficult early in the season, but very feasible at district champs/championships. I personally think it would be perfectly acceptable just to sort by team age at early or even all events, as this would essentially balance out the number of rookies/second year teams everyone plays with/against, getting a pretty decent gain in schedule balance with minimal effort or backlash about the strength metric used.
Of course, some will philosophically object outright to pre-seeding teams, in which case schedule strength balancing is just a non-starter.
Dynamic Swiss matchmaking: Here’s a summary for the curious. This one would certainly be the most radical, but it would be fascinating to me. I’ve dreamed about someday helping to organize an off-season with this format. There would need to be some kind of buffer built in between round scheduling to make sure that teams have sufficient time to prepare for their matches and reduce chaos in queuing. Maybe something like have the first three rounds generated initially, and then generate the fourth round schedule after every team has played one match, generate the fifth round schedule after the second round is completed, etc… Things get weird if there are not a multiple of 6 teams, so this would require a non-trivial system to keep things managed efficiently.
Advantages here would be that teams get to play more close exciting matches, less blowouts generally improves the audience and participant experiences.
One notable potential disadvantage of this system would be that the powerhouse teams would not get as much interaction with the weaker teams as the quals matches continue on. This is a really cool part of the program that would be unfortunate to lose out on. The structure could potentially be modified to make alliances composed of a top-third record team, a mid-third record team, and a low-third record team, but that would likely be even harder to effectively schedule, although intriguing.
tldr: FRC should absolutely use pre-generated schedules, and serious discussion of other changes to the scheduling system is kind of silly without them (although speculation can still be fun).
The dynamic Swiss matchmaking certainly sounds interesting. And exciting matches would be good. I don’t think the disadvantage of not playing against powerhouse teams is that huge. It is fun to play against them, and it is exciting when you pull of a win, that’s true. But they’ll still be in the pits and we’ll still presumably get to see their robot in action.
I think the OP would find Swiss matchmaking very appealing. In the context of the thread, as much as there is any, would this be any fairer than current scheduling? About the same? The way you presented it, and from skimming the summary, it seems like good teams would still have a better W/L/RP record than bad teams, even though they’re playing roughly the same level of competition as rounds go on. But it’s hard to wrap my head around that. Good teams won’t be punished by having to face other good teams as rounds go on?
I disagree, in 2015: “Each Team receives MATCH Points equal to their ALLIANCE’s final score” the rankings were based on primarily your alliance’s total score. In the Michal system it would based on your individual contribution, not the alliance result. Only points/objectives/accomplishments (it’s still very vague exactly how performance would translate into rank points) that your robot completed (possibly assisted as well) would help you climb the leader-board. It’s also worth noting that TMichals seems to lump all 3 points as 1 idea, I believe that 1-2 could be independent from 3: 1-2 changes how the rankings are determined and 3 changes how the schedule is determined.
A problem to me with current system is that I believe that matches where a powerhouse team/alliance completely steamrolls their opponents is not a particularly exciting/inspiring match for anyone. The best matches I’ve watched and been in have been the close matches between teams of equal capability. I believe that there’s more inspiration and learning in a hard fought loss than a easy win.
Personally I’m not particularity annoyed about my team’s losses, we earned those and hopefully we learn from them. I do wish that my team’s robot would have more of a influence in the matches we play. We plan to do that by making our robot better, but I’m not against alternative methods. I know FRC isn’t fair, I want the best teams to be the highest on the rankings even when they aren’t mine, but I think there’s room for improvement in the matchmaking system and it isn’t some sacred cow.
I’m far from convinced that this implementation where FIRST tracks individual stats is practical, but I think the goal is not a bad one.
Thanks Caleb! I debated tagging you because I knew you could knock it out way quicker than I could. Here’s my full backward looking list. As with yours (higher = easier) Numbers are pretty arbitrary. Added a column to compare the two lists. Negative differences mean Caleb’s method ranked the schedule more difficult than mine did.
If you are not looking at match results this is just historical data on teams from previous years?
In my head, a well-executed Swiss-style system would be fairer than the current system. There’s a fair amount of work developing it though that would be required to get to that point. Here’s a short example demonstrating why it would be fairer:
Say team A is far and away the best team at the competition, so much so that they will win any match they play regardless of partners or opponents. Say team B is worse than team A, but still far and away the second best team at the event. At a current event, team A and team B may never face off in a match, leading them to tie with an undefeated record and opening the door for team B to possibly seed first by tiebreakers. However, in a swiss system, if both remain undefeated late into the tournament, they will be almost certainly be matched up against each other at some point, causing team A to receive the 1 seed.
Obviously this is an exaggerated example, but the same principles work even for smaller skill gaps and even for teams in the middle or bottom of the pack.
Yeah, it would be a departure from our current system, which is difficult to understand. And also since we don’t have a concrete example of this in FRC, we’re only really talking about this perfect hypothetical in my head at this point. But swiss-systems are generally considered very fair to all levels of competitors (see the bottom of the wiki page for plenty of examples of how it’s used). Good teams are not punished by having tougher matches, rather it’s more like good teams have better opportunities to prove themselves over comparable strength opponents.
Say you have a set of two matches in our current system, in the first match, the 2nd, 3rd, and 5th ranked teams match against the 36th, 37th, and 40th ranked teams. In the other match, the 1st, 4th, and 6th ranked teams play against the 35th, 38th, and 39th ranked teams. There isn’t much information that we can glean from these matches from a ranking perspective, as we all pretty much know the outcome before the matches even start.
Compare that to a Swiss system, which might have the 1st, 4th, and 6th ranked teams up against the 2nd, 3rd, and 5th ranked teams for one match, and in the other match the 36th, 37th, and 40th take on the 35th, 38th, and 39th. These matches will be much closer and more exciting than the first pair, which is great! But more importantly for ranking, we will actually get much more useful data out of these pair of matches since the outcome is much more uncertain. Perhaps the 37th ranked team is actually middle-of-the-pack skillwise, well with these matches they could really show that and win their match, deservedly improving their rank. But in our current system, if they get in a lot of matches with great opponents and poor partners OR great partners and poor opponents, they have few opportunities to actually show their value.
There has been a lot of discussion regarding whether or not FRC “should” or “should not” do this.
I would like to ask how would you propose to make this happen considering the vast disparity in robot/drive team abilities and the inconsistency in their performance over time? How do you make this happen when the data for scoring for many teams is at or near zero for many matches? For instance, what would you do with the following scenario?
In 2012, our team went to an event where our opponents were 148 and two other teams ranked well below 50%. 148 didn’t move the the whole match but the other two robots handily outscored our alliance. Our team scored a total of 3 times over the whole season (two Regionals and a waitlist spot at Champs). The two other teams on our alliance barely moved. It was discovered that there was a field fault so it was announced that the match was replayed during the lunch break. One of our team members joked that since 148 could easily outscore the other two teams, combined, they might just tell them “hey guys, we got this one, you guys can relax and have a nice long lunch and we’ll handle this match on our own since you did the first one without us”. Of course, the difference between the scores was significantly larger when the match was replayed with 148 doing performing as they usually do.
I think that does take out some inspiration though. It is a great learning opportunity to be on or against an alliance with some top tier robots. You learn how they strategize and prioritize aspects of the game. We’ve also had top level teams come over and help us with our autonomous or fixing parts of our robot because we’d be in an upcoming match with them. These are all great learning opportunities, especially for the bottom of the pack teams. But in the system you’ve described, the bottom teams would almost never interact on the field with the top teams, losing out on those inspirational opportunities.
Correct, it’s essentially a comparison of a team’s simulated ranks from pre-schedule release to post-schedule release. The percentage is the simulated likelihood that the team will seed better with the given schedule than they would with random alternative schedules. But it’s all based on prior season/prior event Elo/OPRs/ILSs, which can be finicky especially for teams’ first events.
I’ve never really delved into post-match schedule strengths just because it doesn’t interest me much. I prefer forward-looking metrics since I can test to see if they actually mean anything later when we get the results, whereas backward-looking feels more like we’re just adding arbitrary weights to things we personally value or don’t value.
Interestingly, @Caleb_Sykes’s analysis shows that 67 actually had the hardest schedule out of any team at the event. And yet, they didn’t let that stop them from still seeding first in the rankings. Almost like there are other, more important factors in play than strength of schedule in determining rankings.
If I’m reading it right, 67 actually had the easiest schedule. But that makes sense, when you think about it. “Schedule hardness” is computed basically as the strength of the opposing alliance compared to your alliance. The reason that 67 has an easy schedule is that they’re on strong alliances, and those alliances are strong because 67 is on them.
I’m still trying to figure out if I think this “schedule hardness” metric is invalid because of the circular reasoning.
Edit: Never mind, I just realized that “higher” probably means “higher percentage” rather than “higher on the list”.
Sorry I wasn’t clear. It’s always difficult to convey the direction with these “strength of schedule” metrics. Lower percentages indicate that the given team will have more trouble ranking better. Assuming my number has value, a team would tend to prefer higher percentages, as that would indicate their chances of getting a better rank have increased. So 67 did have the “worst”/“hardest”/“least favorable”/“best opponents/worst partners combo” schedule of any team in my list.
Also the percentages do their best to isolate a team’s schedule from that team’s performance/skill level. It’s kind of comparing that team’s current schedule to a bunch of random alternative schedules and finding out how good this one is relative to all the others. My TBA articles go more in depth explaining it.
This metric can also vary in usefulness across events. For example, here are the predicted average ranks I had at Milford before the schedule was released compared to the teams’ actual final results:
After the schedule was released, here were my predictions again compared to the actual results:
The R^2 value ticks up just a hair, but not really enough to be very noteworthy. This would indicate to me that the pre-match schedule strengths I calculated for this event probably didn’t matter particularly much.
Let’s compare that to another Michigan event that same week in Kingsford. Here are the predicted ranks versus actual ranks at Kingsford before the schedule was released.
This R^2 value was much lower than for Milford, indicating the seed values I used for teams were not as relevant here as they were in Milford. Here’s what it looks like after the Kingsford schedule was released though:
The R^2 is still drastically lower than that of Milford, but look how much it improved on the pre-schedule predictions. To me, this indicates that the team seeds are still worse than Milford’s, but that the schedule strength predictions were much better here, as my ranking predictions improved noticeably when they knew what the schedule would be.
Looking at these graphs, it is very strange to me that the predictions seem more confident generally before schedule release than after. I don’t know how to explain that, I feel like it should do the opposite. I wonder if that’s a bug or just some explanation I’m not thinking of right now.