Is Power Up Good at Ranking?


#71

The way to get to a “good enough” balanced schedule is to sort teams into 3 bins/tiers by district points. The schedule is then developed by pulling one team from each bin for each match. The current scheduling algorithm already does something along these lines if you look at the match patterns.

Understand that the current system is NOT fair because it does not meet the requirements to achieve a fully random distribution. So a balanced schedule of this type is a better, but not perfect, solution. Matching the sum of the DP for each alliance would be the “perfect” solution, but likely creates too much complexity. The perfect should not be the enemy of the better.


#72

How so? Pre-generated schedules still need to consider the same criteria as a schedule determined at the event. The schedule templates still need to consider time between matches, red/blue balancing, station balancing, and avoiding repeat pairings/plays. The only criteria I can see you sliding away from is round uniformity, but even that depends on the game (2016 had round uniformity built into defense selection) and largely overlaps with considering time between matches. Just because you’re using pre-made schedule templates doesn’t remove the necessity of these criteria.

What’s your source for “surprise” playoff qualifiers, and how are you defining that? I’d also struggle to see how you’re comparing that across sports, considering the differing quantities and methods of playoff qualification between sports (12/32 teams make the playoffs in the NFL, as opposed to 16/30 in the NBA, 16/31 in the NHL, and either 8 or 10 out of 30 in MLB depending on how you view the single game play-in for the wild card). Finally, why is “surprise” playoff appearances the metric you’re using for this?

But getting to the central point, the division structure of the NFL is far more substantial than any other portion of the scheduling in the NFL (37.5% of games are against division opponents). It’s painfully obvious to both casual and hardcore NFL fans that division assignments play a crucial role in how a eam performs in the season, and even more so with advancing to the playoffs.

The NFL also experiences extreme year-to-year volatility of success compared to other North American sports. Some of this is likely due to the nature of the sport itself, but it also stems heavily from the relatively small amount of regular season games played in the NFL compared to other sports increasing the variance of the results. The scheduling format serves to amplify this effect even further.

I don’t follow tennis, so I cannot speak about it in an informed manner.


#73

Absolutely! As I understand it, the NFL intentionally sets its schedule to be “interesting” rather than “balanced”. “Interesting” meaning that they can sell tickets and (more importantly) get TV audience share. In recent years they have taken this to extremes by scheduling most conference matches near the end of the season, particularly if they believe the teams are evenly matched. As an example, the first of the Atlanta Falcons’ six conference games in the 17 week 2017 NFL regular season was week 9! Their six conference games were weeks 9, 12, 14, 15, 16, and 17, with the two “guaranteed ugly” games against the Saints on weeks 14 and 16. As little love as this Saints fan holds for the Dirty Birds, this is just wrong.


#74

That would create a huge non-linearity in difficulty of schedule at the boundaries of the tiers. Whichever team is the bottom of the high tier would face someone better than them in every match and would never be with someone better. The next best team would be with someone better than them in every match.


#75

In my head, a balanced schedule is more similar to the former than the latter. I really have to think about it more though. The current setup I’m imagining just tries thousands of different assignments of teams to the pre-generated match indices, and chooses the one which optimizes some kind of balancing criteria like you described above. This would not create perfectly balanced schedules, but it should be able to eliminate the most unfair schedules without sacrificing any of the criteria of the current algorithm. I’d probably even just rank teams by DP or whatever instead of using absolute DP values so that we can have pre-generated “balanced” schedules. Then all you have to do is rank teams before the event by whatever criteria you want and assign teams to indices based on this rank.

The real trick then is just finding a metric to use to represent “balance”. There’s a bunch of options here, and all will have some tradeoffs, but I’m pretty sure we can find some metric that most people would agree is better than random. Better than random is a pretty low bar to clear.


#76

To follow up on this thought I made a schedule generator in excel.

For my baseline I used the MN State 2018 schedule. It’s 36 teams, 9 matches each. I used that one because it’s small and 36 divides nicely by 3, 6, 12, etc.
I assigned each team a random number between 0 and 100 and assumed that is how many points they score each match (so OPR and a game where OPR is perfect). So excel would nicely regenerate a rating for each team and be able to translate that into wins/losses/ties which I moved to rank points with no bonuses. I then plotted OPR rank versus rank points and looked at the r squared of a linear trendline. I saw an average around .62 or so.

I then used my same rank assumptions but created a schedule where each round was filled randomly using the tier methodology. I then looked at the same r squared result of my plot. I saw an average around .055. That’s not a typo. Basically the 13th and 25th teams were at the top of rank points so often and the 12th and 24th teams were at the bottom so often that the linear trendline would think it was a random scatter plot.


#77

Here is another proposed solution: eliminate extra RP, and then just use the same “strength of schedule” equation to determine the number of points a team really deserves.

For example, the 2018 NFL season had a strength of schedule that showed, before the season, the Green Bay Packers with the hardest schedule in the NFL, while the Washington Redskins had the 14th hardest. The Packers opponents combined win percentage before the season was .539, while the Redskin’s opponents was .504. Both of these teams went 7-9. Say that each win is worth 2 points each, just like it is for a win in FRC. Multiplying the combined win percentage by the record of the Redskins gives us 3.528 points, while multiplying the combined win percentage by the record of the Packers gives us 3.773 points. Therefore, the Packers are the better team.

Now, how do we determine the combined win percentage? Just keep it update through the competition.

Maybe this would work, maybe this wouldn’t. I’m writing this at midnight, so if I am horribly wrong, you can tell why. I’ll try to analyze this more later, after I get some sleep.


What does "strength of schedule" mean?
#78

A few years ago the NFL started back-loading the divisional matchups to ensure teams didn’t clinch their divisions with multiple weeks lefts in the regular season, resulting in meaningless games. I’m not sure it has anything to do with matching up evenly matched teams. In fact, according to Vegas, the Falcons heavy favorites to finish first in the division while the Saint were expected to finish last.

I think it’s important to note that in the NFL who each team plays is 100% determined by the team’s division and which place they finished the year before. So, we already know 14 of the teams the Vikings will face in 2025 with the remaining two determined by final 2024 records.

The NFL does determine when and where the games are played, however. I generally don’t think when/where games are played over a 16 game schedule is that significant, however. Here is a pretty interesting article about that process including generating over 50,000 potential schedules before deciding on the final one: https://www.si.com/mmqb/2017/04/21/nfl-2017-schedule-howard-katz-roger-goodell.


#79

Yes, the perfect would be the enemy of the better. Instead, lets have a match where three teams are ranked in the top 8 on one alliance and the other alliance has one strong team, and two other teams that are at the bottom of the third tier, and lets have that match determine the ranking as its the last match for each team. That sounds so much more fair.

Teams can go an entire tournament in which they play only with strong teams, and then when they don’t have a strong team, they play weak alliances. I can point to several recent examples of that. And on the flip side, we already have scheduling snafus in which a stronger team ends up with a string of weak partners and continually faces stronger alliances.

But more seriously, how precisely can one measure the relative abilities of the teams. I feel safe in slotting teams into three broads “bins” in any competition, but then truly distinguishing between clusters of teams within those bins is much more difficult.

PS, don’t try to claim that the current scheduling is “random” and therefore “fair”–it is only “arbitrary.” Within a student’s entire career, they are unlikely to play enough matches to approach the conditions for the distribution of matches to be statistically valid as a measure of the relative abilities of the teams across competitions, much less within a single competition. The only valid solution is to create a balanced schedule.


#80

This assumes much more precision and much less real variation than what actually occurs in competition.


#81

That could be a good solution for the W/L/T portion of the ranking. FIRST adds the other ranking points to accomplish other organizational goals beyond just pure competition, but that doesn’t mean that those can’t be combined with a better metric of comparative competitive results.


#82

This could be a good solution. Set the tolerance on the balanced rankings for each alliance to some value +/- a certain number. Then generate a preset schedule (which FIRST already already does anyway) which has a minimum number of matches between.


#83

Let me have a different take on this: It’s clear that the current scheduling algorithm is NOT random. I understand that FIRST already has a few set schedules for certain combinations of event size and match numbers. At Central Valley, we played 2 matches against 5817, and 2 matches later played against them. One might chalk that up to one in a trillion chance, but the same patterns keep happening, particularly two teams playing together in one match and then against each other in the next match. It appears that slots are scheduled in a distinct pattern, but the assignment to those slots is scrambled in some fashion. It may be the easiest solution is to assign teams to those slots in a ranked fashion instead.


#84

The algorithm is described here: https://idleloop.com/matchmaker/

The method you describe is used in Chezy Arena, but unless someone knows something I don’t, is not used in official events. Perhaps an FTA would care to comment?


#85

I can’t believe it took me a week to remember respond.

TL;DR Sometimes extra RPs prioritize the wrong things.

I dislike the addition of extra RP because sometimes they don’t translate well to elims. Look at 2017, with all the upsets that occurred. It’s because offensive quals could sometimes bring teams to the top that would fall to the defensive lower seeded alliances. My team usually plays a lot of defense, but we did this especially in 2017 (probably because our driver played football, hockey and lacrosse, and our operator did too with the exception of hockey). In our first two competitions, we won as the number 7 seed. We won Troy undefeated in elims as the 7th alliance captain because we played heavy defense on teams like 469, 217, 2337 and 3098. Sure, we may not have received as many RP in the quals, but our defense heavy elims helped us win the event. This year was good early in the competition season, but even at DCMPs, we saw this RP system start to crumble. The #8 alliance on DTE at MSC nearly beat the #1 alliance. My team was on the #6 alliance on Dow and made it to semis, and even pushed that to a tiebreaker against the #2 before we lost. The #3 on Ford ended up winning the whole event, beating teams such as Michigan powerhouse 67 (who is tied with 469 for most state championship wins), Einstein finalist 3357, and others to take the blue banner home. At CMP, Turing had the eighth alliance take the division, beating one of three 4-cube scale auton teams in quarters to do this. On Daly, 469’s eight seeded alliance beat #1 and took semis to a tiebreaker match. 71s alliance, which was either 4 or 5 (can’t remember) went to finals before losing to 217s alliance (arguably the best alliance in FRC this year besides 254+148, shame that they had electrical issues on Einstein, was looking forward to some exciting matches). Finally, not one of the alliances on Einstein finals was a #1 seed (both were #2). If these alliances were good enough to make it to Einstein finals, then why couldn’t the captains rank #1? That’s why I dislike RPs, although straight WLT wasn’t the best either, as a team could easily be carried to the #1 seed at a regional.


#86

All schedule balancing schemes will require ranking or sorting of teams prior to matches actually being played. All attempts will leave FIRST looking like the Bowl Championship Series. I do not think we want to go down that road.

  1. For a discussion with your alliance partners it is good for teams to know your match schedule as early as possible. Morning of is good.

  2. Teams will add and drop events with as little as 24 hours notice. So you need to generate the match schedule the morning the matches start.

  3. As long as the average time between matches for all teams fits in a narrow band. Yes… some matches you are hot lapping and some matches you have some time for repairs.

  4. At any given match I have equal chance to be allied with a top team or opposing a top team. Things will and should balance out. I would love to have 15 matches but 12 is workable.

Teams change things around and solve problems between events and during events so trying to balance the schedule before it is generated is just guessing at who will be top ranked at an event.

Take this match as an example. All balancing systems would have said blue for the win.
RED 4776 (14) 3668(11) 5685 (35)
BLUE 7220 (26) 67 (1) 27 (5)

You have to play the matches.


#87

I had blue as 90% favorites to win this match. The fact that red won does not by itself invalidate my predictive model, nor does it provide any indication that this was a “balanced” match.

You have to play the matches.

Clearly. No one is advocating for not playing matches, nor is anyone claiming to know the outcome of any match with 100% certainty before it happens.

I generally agree with most of your other points.


#88

The 2017 rankings were not determined by the bonus ranking points–in fact the auto 40 kPa and 4 rotor bonuses were highly correlated with elim performance. In Houston, both finalist alliances regularly achieved those. The problem was the win/loss which was determined by the end game in which alliances had to get 3 climbs to guarantee being competitive. The climbs were almost 50% of a typical 3 rotor/no kPA match. In fact, without those two bonuses, the rankings would have been WORSE than they were!


#89

Understand the current scheduling algorithm does not meet any criteria for being random or fair, so we should choose a viable alternative. If you think the status quo works, present a structured argument that supports your case–it should not just continue simply because people are sniping at the alternatives. And don’t use these anecdotes as though they support your argument. Use data analysis to present your case.

And no, you don’t have an equal chance of being allied with a top or bottom ability team once the initial seed is put in because of the cycle time constraints. We can either arbitrarily generate the initial seeding in a manner that will favor certain teams throughout the competition, or we can set them in a balanced manner so that teams can play roughly balanced schedules. Note that the basis of getting to fair outcome with random chance is to have multiple trials (as in dozens or hundreds). That will never be the case for FRC competitions for obvious reasons. Things will never and have never balanced out. (You can provide empirical evidence to support your supposition.)

Generating new schedules on a short notice is no problem with the right algorithm. Don’t think that there is only one solution to this problem.


#90

Here’s the scheduling criteria for the 2017 game:
10.4.2 MATCH Assignment
FMS assigns each Team two (2) ALLIANCE partners for each Qualification MATCH using a predefined
algorithm, and teams may not switch Qualification MATCH assignments. The algorithm employs the
following criteria, listed in order of priority:

  1. Maximize time between each MATCH played for all Teams
  2. Minimize the number of times a Team plays opposite any Team
  3. Minimize the number of times a Team is allied with any Team
  4. Minimize the use of SURROGATES (Teams randomly assigned by the FMS to play an extra Qualification MATCH)
  5. Provide even distribution of MATCHES played on Blue and Red ALLIANCE
  6. Balance assigned PLAYER STATION proximity to a BOILER.

Why does item 5 even matter? (Although it was more important in 2017 than in other years.) The first 3 criteria are the only ones of key importance. (4 is dependent on the number of teams in the event.)