I have been weary of TBA Insights for awhile, but I know some teams still like to use it in lieu of quality detailed scouting data.
What I’m trying to figure out is why some of its calculations are so incredibly out of touch from reality sometimes. While I’m not going to attempt to figure out how overall OPR is calculated here, I can try to make sense of the data based solely on the match averages for some critical stats.
I wanted to compare 2 approaches TBA could take here: taking the sum total of the relevant score for each match my team played in, and dividing it by 3 (and averaging out over total matches played) or by somehow counting the relevant total for individual robot performance (averaging over total matches played).
I focused on 3 stats for this exercise because they were easy to review: Total Game Pieces Scored, Total Onstage Points, Foul Points for Beach Blitz 2024 data for all 10 quals matches we played. For the alliance numbers I grabbed the final score breakdown from every matched played, and for individual numbers I watched my own team’s actual scoring in every match replay video.
Alliance, Total Game Pieces: 191 (6.37 game pieces per robot per match)
Individual Total Game Pieces: 66 (6.6 game pieces per match) TBA Claim: 5.21 game pieces per match (19% less than Alliance average, 21% less than individual performance, resulting in 8 places lower)
Alliance Total Onstage Points: 57 (1.9 points per robot per match)
Individual Total Onstage Points: 19 (1.9 points per match average) TBA Claim: 1.64 (14% less than both alliance average and actual individual performance, resulting in 4 places lower)
Net Alliance Total Foul Points: -2 (-0.067 points per robot match)
Individual Total Foul Points: 5 (0.5 points per match – aka generated more points as a result from fouls than we caused) TBA Claim: -1.11 (Resulting in 13 places lower)
Does anyone know the real source of these calculations? How can they be so off?
A similar mathematical method is used for all stat insights. It basically uses some linear algebra.
For an over simplified explanation, let’s say typically your robot scores 2 game pieces every match, and then one match your score 12. But the match you scored 12 you were with a robot that typically scores 12 game pieces, but this time broke and only scored 2. The mathematical model would assume that you still only scored 2 and the partner that broke still scored 12. OPR also weights all matches equally, so if you get better over the competition, it doesn’t necessarily reflect that.
But this is definitely why TBA or statbotics can only supplement, not replace, real scouting information.
I think what you’re saying here is that such a result reflects the fact that higher performing teams, when on your alliance, underperformed themselves in the match – but the relatively lower performing team (overall) absorbs that discrepancy?
Only trust statistics if you understand the statistics. This is what the OPR stat calculates:
“If every team scored a fixed number of points each match, what is that number for each team which best adds up to the recorded scores”
OPR doesn’t saying anything “accurate” about your team’s performance, but it does say something useful. It can be used to predict match outcomes when you don’t have the real data for each team (which of course, will be different for every match). Use with caution!
If I am expected to score 6 game pieces and my other 2 partners are expected to score 1 (8 total) and we only score 6 the model would not give me all 6 but would give me 5 and the other two each 0.5 meaning they both eat half a game piece of negative while I take a whole
This is a gross over simplification but the point of it is teams who do more of a task are likely to get a higher rating attributed to them in a match.
As Eugene mentioned, these are based on OPR, which can be traced very far back in time (in the blog!). However, instead of using alliance scores, it uses alliance XYZ values, where XYZ can be any of “total game pieces” or “total climbs”.
These are not absolute numbers. They are much better served as relative numbers, and they are wildly inaccurate with a smaller number of matches played.
Frankly, if you ask me, a 14% error based on literally zero available individual robot statistics is pretty decent handwaving.
Lets also consider the case of a pure passing robot that never scores a single game piece. Do you give them 0 game pieces scored, as your scouting would indicate? Do you give them 1/3 of the game pieces their partners scored? Do you give them the number of game pieces their partners score above their average?
Alternately, lets consider a robot that is very consistent, but is slow, blocks the best shooting lanes, and bumps and interferes with it’s partners, and it’s partners score less in those matches. The scouting data would show the exact number of game pieces scored, but would miss the degradation of partner performance that was caused. 1/3 the points scored by the partners would work if the partners were close to their own scoring. But what if those partners normally score a lot more?
Of course not, because the scouting system I use with my team counts every shuttle as a “cycle” with a context of shuttling. While teams performing cleanup counts every scored note as a “cycle” with context of the cleanup role.
I brought this all up because I was being asked why according to cOPRS we were ranked 40th out of 42 teams for Foul Points issued (-1.11 per match) when after reviewing video of every match, noted that my driver never actually caused any foul in any match. As jtrv said:
I see the math in the blog, and that’s fine. But I can subjectively derive a couple meanings from this data:
our alliance partners that typically don’t cause fouls (or possibly actually draw a lot of fouls to their benefit), happened to cause some fouls in matches we were partnered with them, and as a result our stats “absorbed” a lot of that fault
many teams may be actively attempting to draw fouls, effectively raising this number, and in the relative sense not doing that lowers your overall score
Perhaps your team isn’t doing the prep other teams are doing to avoid fouls as an alliance.
EDIT: I admittedly did get foul points reversed, lower means you were least likely to get fouls called on the alliance and get your alliance points. The point still stands that if you want to read into foul points copr, the take away is either teams higher are better at arguing their case with the refs about fouls committed against them/better show to the refs when fouls are committed against them/are baiting fouls. Using foul cOPR is a bit silly in practice and I think it’s been said enough that the OPR algorithm is drawing meaning from aggregate data, which will never be perfect but potentially better than nothing given your use case.
The foul point metric (while undoubtedly messy without manual scouting) is better served as a tool to find penalty avoiding/drawing strategies. After seeing how important penalties were at our first comp (we upset 148 twice), we watched over 9128’s matches (9128 had highest foul points) to see where most penalties were being drawn. Using this analysis, we improved for Amarillo (example) and dcmp (we reached highest foul points by a large factor).
IMO, TBA insights are a lot like Wikipedia articles. They provide good baseline information, and provide an easier way to further research topics.
Foul cOPR is generally going to have a lot of noise when fouls are rare.
If a team causes 1 foul in 5 matches, I don’t expect them to have a 20% chance to foul in the rest of their matches. They could easily get no more fouls or several more.
Let me just quickly say, I don’t think any one is using it “in lieu of”. I think everyone would prefer quality detailed scouting data. They just can’t get it for one reason or another. (Not enough people, colectors don’t take it seriously (bad data), or many other reasons) TBA is just one of several (very appreciated) resources you can use. I think teams use it more in lieu of NO data.
Sorry to distract from the technical part of this discussion. Back to your regularly scheduled program.