What does "strength of schedule" mean?

I hear the term “strength of schedule” (SoS) thrown around sometimes and I’m not quite sure I understand what this means in an FRC context. For reference, here and here the most recent thread discussing this topic. I’d like to start a discussion specifically on what (if any) metric makes the most sense to represent an individual team’s SoS, and then ideally to expand on this definition to come up with a metric that represents how “fair” or “balanced” an overall match schedule is.

I’ve made a program that I can use to calculate metrics like this for all of last season’s events, the first set of results from that program can be found in my miscellaneous statistics projects thread. I have created a metric which I think has potential as a SoS metric, and I’ll copy the details of that metric below. As the thread progresses, I’m happy to revisit that program and calculate other metrics if people want.

Here’s my first pass at a SoS metric. I calculate it by finding the probability that a given team will seed better with the actual schedule than they would have with a random schedule. So a “schedule strength” of 0% means that you will never seed higher with the existing schedule than you would have with a random schedule, and a “schedule strength” of 100% means that you are guaranteed to seed higher with the actual schedule than you would have with a random schedule.

What I like about this metric:

  • It compares the given schedule against other hypothetical schedules
  • It is customized for each team, that is, it compares your hypothetical results with a random schedule with your hypothetical results with the given schedule. I’m not the biggest fan of team-independent metrics since, for example, a schedule full of buddy climb capable partners is amazing for a team without a buddy climber, but just alright for a team that has a good buddy climber, and team-independent metrics would have to give the schedule a single score for both of these teams.
  • It’s on an interpretable scale (0% to 100%) and has meaningful significance
  • It’s able to be calculated before the event occurs (I don’t like metrics that require hindsight unless maybe we want to use SoS as a tiebreaker or something)
  • Incorporates RPs

What I don’t like about this metric:

  • Requires a full event simulator to calculate
  • Teams that are basically guaranteed to seed first (like 1678 at their later regionals) will inevitably be shown to have bad schedules, since there is no schedule that would give them much of a better chance of seeding higher than their expectation of almost certainly first. Switching to greater than or equal ranks just flips the problem to high scores instead of low scores for these scenarios
  • Average value is 48.1% instead of 50%

I’ve done a simple strength of schedule calculation of few times. I take some metric, like OPR or ranking points, add up a given team’s 2 partner’s scores and subtract the 3 opponent’s scores for all matches. The given team’s own score is not included. I do this for all teams and then sort that total score. Most or even all teams will have a negative score - the least negative has the easiest schedule. Then I compare the actual rank to the SOS rank and look for significant differences. For example, teams that had an easy schedule but finished much lower and teams that had a difficult schedule and finished much higher. I’m not sure how much it really means, other than to maybe look closer at these teams if they were or were not on your pick list.

I would consider averaging the given metric of the partners versus the average of the opponents. Or maybe substitute a “middle” score instead of the team’s own score.

When the college sports pundits talk about “strength of schedule”, especially for football teams in the top 25, the team being discussed really is a non-factor. The pundits don’t discount Alabama or Ohio State’s win against the #24 ranked team simply because they’re ranked #1 or #2.

For FRC, strength of schedule helps answer the question, why is team X ranked where they’re at? Did they get a favorable schedule or did they claw their way to their ranking despite their schedule?

It also can help detract or bolster the case for team Y as GOAT…

I’m on a computer without a spreadsheet program right now, so forgive my ignorance.

How does this SoS account for week 1 events? Especially those with several rookie or year-two teams? As we all know, some rookie teams hit the ground running and do amazing. And some year-two teams perform a much different level than they did rookie year.

Here’s a google sheets version if that is easier.

How does this SoS account for week 1 events? Especially those with several rookie or year-two teams? As we all know, some rookie teams hit the ground running and do amazing. And some year-two teams perform a much different level than they did rookie year.

All rookie teams are started out with an Elo rating of 1450 (roughly a 30th percentile team) and calculated contributions equal to the average calculated contributions of all week 1 rookies.

If you were to treat same placement results as half a point in the better and half in the worse, would that eliminate the dominate team issue?

There is something weird with the Southern Cross Regional. You have the regional listed with 30 teams and only 30 teams listed when there really was 40. Not sure if other events have that same issue.

I looked closer and I think I see what is going on. It looks like 40 teams signed up but only 30 teams showed up. So the random simulation has 40 teams in it while the real schedule simulations have 30 teams in it.

When I brought up strength of schedule in the “Power Up good at Rankings” discussion, I never really considered it as match making decider.

However, it could also work as a method of match making. One thing I like about it is this:

I like this because of the similarity it bears to the algorithm that is used to develop District Championship divisions, as described by Jim Zondag and Dan Kimura, where the divisions are compared to other hypothetical divisions until an even 4 (or 2 in Ontario) divisions are created. You look at the 2018 MSC, and the divisions very evenly matched. Every playoff round had a rubber match on FIM-stein, and besides the Dow division (which was dominated by the eventual captain of the Einstein winning alliance, who then picked the eventual captain of the Einstein finalist alliance), every division had their finals matches taken to 3. This shows how equal all of the divisions were, IMO, and why the “strength of schedule” comparing a schedule to other hypothetical schedules to create the best possible one would be a great way to go.