Yeah, I agree that the units don’t match with the name. I orginally had this metric and a few of the others in units of matches, but I normalized everything to the range +1/-1 in order to more easily compare schedule strengths between events of different sizes, and to compare against other schedule strength metrics (like Expected Rank Change) that have units of teams and not matches. If I were using EWA standalone I likely wouldn’t bother with this normalization.
This is indeed a really interesting result that arises out of the schedule strength conversation. I actually included a mix of team-specific metrics (Caleb’s SS, Expected Rank Change, Expected Wins Added, and Winning Rank Matches) and team independent metrics (Weighted Rank Diff, Average Rank Diff, and Average Elo Diff) in my analysis. I’m glad that the metric I settled on happened to be team-independent, but I didn’t know that going into the analysis.
When looking for metrics to use for schedule generation, I feel there is a fundamental difference between trying to “balance” schedules (team-independent metrics) and trying to give each team a schedule that is optimal for them (team-specific metrics). The former is just trying to make sure experiences are roughly equal to all teams and no one gets unreasonably shafted by rng. The latter is much more akin to “seeding” tournaments, as the team I “seed” first going into the tournament will become more likely to rank first, the second seed more likely to rank second, etc… What I really don’t want is for a team that I have seeded 23rd going into the tournament to get basically locked out of ranking first just because I thought I was giving them a schedule that was better suited to their ability (so that they’d be more likely to seed 17-22 for example).
You actually can extract team-specific schedule strengths from my event simulator pretty easily, so I’m not opposed to looking at this kind of a metric, but for a general purpose schedule strength metric, particularly one I’m going to use to build schedules I feel team-independent measurements are preferable.