[TBA Blog] Schedule Strengths (1 of 3): Finding the Best Strength of Schedule Metric

Schedule Strengths (1 of 3): Finding the Best Strength of Schedule Metric
By Caleb Sykes

Background

A few months ago, I kicked off discussion about how to define Strength of Schedule for FRC, and introduced a new metric to take my best shot at quantifying it. Well after a long hiatus, I’m back at it looking at strength of schedule metrics. To summarize, what I am looking for is a metric which is forward-looking, year-independent, and mirrors as much as possible what we colloquially mean when we say a schedule is “good” or “bad”. I want these three properties for the following reasons:

Check out the rest of the article here: http://blog.thebluealliance.com/2019/02/04/schedule-strengths-1-of-3-finding-the-best-strength-of-schedule-metric/

3 Likes

A paper on this topic from a few years ago.

The methods discussed in this paper are essentially comparing the expected number of wins that an average team would have with a specific schedule to the expected number of wins the same team would have with an average schedule. This can be forward looking given match predictions and predicted match score variances, or backward looking given full match results. Team specific versions are also shown (i.e., expected number of wins a team with a particular OPR would have with a specific or average schedule).

BTW, this stat is called “Schedule Win Differential” (SWD) in this app, available free for FTC events. The app computes it on the fly for each team.

It kinda-sorta compares with the Expected Wins Added. I humbly suggest that the name for EWA is off, as I’d expect that the units of EWA would be actual # of wins. For example, I’d expect an EWA of 1 = the schedule will give this team 1 more win on average compared to an average schedule. SWD works in this way, with teams with +SWD having “easy” schedules and teams with -SWD having “hard” schedules.

One other interesting thing that the paper discusses: a particular schedule could be good for an average team but bad for a very good team, or vice versa. For example, imagine a schedule where a team’s partners are expected to overperform their opponents by 20 points in each match but then underperform by 200 points in one match. For an average team (that might expect to win 1/2 of their matches with an average schedule), this might be a great schedule as this gives them a high chance of winning every match but one. But for a well-above-average team that might score 100 points more than everybody else, this schedule is bad because they’d normally be expected to win all of their matches but now that under-perform-by-200-points match is going to result in a loss that they otherwise would rarely have.

2 Likes

Nice paper with a good explanation. Looking forward to the next step.

Yeah, I agree that the units don’t match with the name. I orginally had this metric and a few of the others in units of matches, but I normalized everything to the range +1/-1 in order to more easily compare schedule strengths between events of different sizes, and to compare against other schedule strength metrics (like Expected Rank Change) that have units of teams and not matches. If I were using EWA standalone I likely wouldn’t bother with this normalization.

This is indeed a really interesting result that arises out of the schedule strength conversation. I actually included a mix of team-specific metrics (Caleb’s SS, Expected Rank Change, Expected Wins Added, and Winning Rank Matches) and team independent metrics (Weighted Rank Diff, Average Rank Diff, and Average Elo Diff) in my analysis. I’m glad that the metric I settled on happened to be team-independent, but I didn’t know that going into the analysis.

When looking for metrics to use for schedule generation, I feel there is a fundamental difference between trying to “balance” schedules (team-independent metrics) and trying to give each team a schedule that is optimal for them (team-specific metrics). The former is just trying to make sure experiences are roughly equal to all teams and no one gets unreasonably shafted by rng. The latter is much more akin to “seeding” tournaments, as the team I “seed” first going into the tournament will become more likely to rank first, the second seed more likely to rank second, etc… What I really don’t want is for a team that I have seeded 23rd going into the tournament to get basically locked out of ranking first just because I thought I was giving them a schedule that was better suited to their ability (so that they’d be more likely to seed 17-22 for example).

You actually can extract team-specific schedule strengths from my event simulator pretty easily, so I’m not opposed to looking at this kind of a metric, but for a general purpose schedule strength metric, particularly one I’m going to use to build schedules I feel team-independent measurements are preferable.

I feel team-independent measurements are preferable.

I fully agree with this. But it is interesting that a team independent measure that says that a particular team had an average strength of schedule may not accurately reflect whether or not they were actually helped or harmed by their schedule unless they were an average team to begin with.