That was one of my favorite insights as well.
I recently took an interest in FRC data analysis and was wondering if there were any adjustments you made when calculating contribution for game elements that have a commonly reached ceilings (ex. defense crossings, breach achieved). The calculated contribution values for these statistics seem to bunch up near the average value because the same result is reached in most matches. For example, with defense crossings at champs divisions, all of the calculated contribution values are between 2.2 and 3.2 when there were clearly some teams that focused more heavily on breaching than others.
I did not treat any categories differently based on the likelihood that they would occur. You are correct though that calculated contributions have less value when looking at categories which happen incredibly frequently or incredibly infrequently. Defense crossings last year was a great example.
Did you run in to this problem? If so, what techniques would you recommend for getting more accurate/adjusted calculated contributions?
I would prefer not to change how I determine normal calculated contributions, because they mean something very specific mathematically, and they would lose that meaning if we performed adjustments. I might be willing to provide supplemental categories with adjustments, but based on the data the API provides, I really can’t think of a good way to, for example, determine which teams spent more time doing defense crossings if nearly every match has close to the same number of defense crossings. I’d be interested in ideas though if anyone has any.
However, if I were trying to predict the matches in which breaches or captures would occur, I would likely proceed the following way:
First, I would try to pull in as much relevant information as possible. For breaches, that would include looking at the “A crossings,” “low bar crossings,” etc… categories. For the capture, that would include looking at the “subtracted tower strength” and “challenge or scale count” categories.
Next, I would find the best way to add each team’s calculated contribution for each of these categories together to create predicted average points for each category. The easiest way to add them together would be to just have predicted score p = a + b + c, but I can imagine situations where it would be beneficial to add contributions in log space (ln(1+p)=ln(1+a)+ln(1+b)+ln(1+c)), in quadrature (p^2=a^2+b^2+c^2), or with a weighted sum (p=kAa+kBb+kC*c).
Then, I would look at the correlations between categories. Is a breach more likely for an alliance if they have a high predicted C crossing score and low predicted A crossing score or if the alliance has an average C crossing score and an average A crossing score?
Finally, I would add in uncertainty to come up with a likelihood of a breach or a capture.
My long list of things to look at before week 1 events does include predicting breaches/captures from last year in preparation for predicting 4 active rotors and pressure threshold reached this year.