I *think* the bell curve is the predicted score range, with the vertical line being the actual score. In the match you pictured, TBA expected the blue alliance to get 39 points, but they actually got 32, where they expected red to 56 pts but they actually got 74.

Yeah I switched that up, that makes more sense.

This is correct. The score distribution doesn’t actually follow a bell curve, but the math makes some basic approximations and it’s shown like that for visualization purposes.

I like to look at match predictions for a couple reasons

- Provides a second opinion beyond our intuition and our data on our risks in the match, which can inform whether we play a risky or safe match strategy.
- Rankings simulations (which are usually aggregated from match simulations) can be incredibly helpful for focusing on preparing for the most likely scenarios

I could see how match prediction could be helpful later in the event for potential seeding implications. This past year we had a very simple sum of each robots points to get a rough idea of how easy or hard the match would be. It was nice to share with the non-scouts that ask “how does our next match look”, but we generally dig deeper into the data for match strategy (ie: team may have climbed 2/5 matches but climbed in last two matches, so they are more likely than 40% chance to climb). Having an idea of how hard or easy a match is helpful to determine if how risky of a strategy to pursue and/or if you can spend time on bonus RPs, but I don’t think you need match predictions to determine this.

How often are those predictions updated?

I think others have covered the main reasons, so I’d like to explore some additional reasons why I believe match predictions can be helpful.

Plenty have already mentioned risky/conservative win strategies, and that’s important for sure, but remember that there are two other RPs in the match as well from bonus objectives. As a rule of thumb, I would always highly prioritize the bonus RPs in matches where the outcome was >70% certain. That’s because when you get into matches with that large of a skill gap between the alliances, there’s not a lot to be gained by either alliance from focusing on winning as the outcome has essentially already been scheduled. If that happens, just take the win/loss and go guarantee you get as many bonus RPs as you can.

The other thing I’d like to point out regarding match predictions is that they are the best validation tool you have for your scouting system. When you get around to alliance selection, you are trying to pick the teams that give you the best chance of winning. I really don’t care how high they score on all of your fancy scouting metrics, what matters is how well your scouting can actually predict who will win and lose.

Honestly, if you’re not validating your scouting with match predictions or something similar, you are becoming dissociated with reality. It’s easy to look back on matches that have already happened and explain why the outcome ended up the way it did, it’s much harder, but much more enlightening imo, to try to look forward and predict what will happen. This can be a bit more painful of a process, as you’ll be forced to face your own incorrect predictions head on, but that’s really what you gotta do if you really want to learn and improve. For reference, I can predict the winner of FRC matches about 70% (year dependent) of the time using Elo/OPR, so you should be hitting at least that high using your scouting data or you’re probably doing something wrong.

Sure! Here’s a good jumping off point. You can use a simple logistic function to make win probabilities. For each team, give them a rating (I’ll be assuming the rating has units of 2019 points, but any units can work), which we’ll denote q. Sum the ratings for the red robots to get q_red, and likewise for the blue robots to get q_blue. Plug those ratings into the following formula:

WP_blue = 1/(1+10^((q_red - q_blue)/s))

Where WP_blue is your predicted blue win probability and s is a scale factor that you have to determine. 30 is a good baseline for s in 2019, go higher if you have weaker scouting data or are early in the event, go lower if you are more confident in your scouting data or have verified predictions at earlier events.

Play around with prior scouting data and match results to determine how exactly to make your team ratings, a good starting point is just to sum up a team’s average points scored across all categories based on your scouting data, but you can add or subtract from teams’ ratings using whatever criteria you want. Understand this is just a baseline, you can start to play around with alternative formulas or variants of this one once you feel like you understand this formula well enough and it feels too restrictive.

Finally, I know Eugene already mentioned it, but jump on in to some of the prediction contests. It’s a fun way to compare your results to others and you can ask questions and learn right along with everyone else. I’m happy to answer any questions about my event simulator or Elo if you have any.

Maybe we have different definitions of “scouting system”, but wouldn’t validating against FMS scoring be more accurate?

A lot of what say makes sense in theory, but I think we are a long way from getting to that point where match prediction plays a role in alliance selection. It would be cool if each robot had a ‘winning percentage if picked’, similar to how analytics is used in the NFL for 4th down decisions (ie: 17% chance if punt, 19% if go for it), but I’m not aware of team currently using that. I also question if that is even possible given the small sample sizes – due to both matches at event and yearly changing game.

Not quite sure what you mean here, let me try to rephrase my statement a little better:

The best way to validate your scouting data is to make predictions for upcoming matches and compare those predictions to the actual results of those matches. Those predictions can be any of simple W/L predictions, win probabilities, final score predictions, or full score breakdown predictions.

The two key points for me are that:

- you are comparing something forward looking (predictions)
- with a measurable objective result

And the difference between 1 and 2 should be minimized.

I think we are talking about two different things.

I’m referring to validating scouting data that says that 254 climbed in match 12. In order to validate that 254 actually climbed that match, I think checking the FMS data would perfect.

I think you are getting at validating the value of the scouting data. The idea being that even if we had perfectly accurate match data, if it doesn’t predict future matches, then the scouting data isn’t valuable. I get what you mean by this but when nearly all teams’ “scouting data/systems” is just recording basic data (# of cargo scored, # of panels, climb…), I think validating the raw numbers is far more important/applicable.

Sure, but you can’t necessarily validate all raw numbers from scouting against FRC API data.

Take 2019 for example. Validating climb state is easy. Validating individual robot contribution to the number of scored elements is harder (FRC API only provides alliance totals). Validating cycle times/preferred loading zones/etc. is near impossible.

I think what Caleb is getting at is that there are a lot of things you can be scouting that cannot be derived from the FRC API at all. Therefore, you would expect predictions that come from your scouting data to outperform the best possible prediction algorithm that relies on API data alone.

Ah, I see, yes I believe we were talking about different things. I think you and Eugene summarized the difference pretty well.

When I say this, I’m trying to get at the fundamental idea that all scouting is a way of abstracting robot performance. Every robot has a million different features that make it unique from every other robot. When scouting, you necessarily have to distill all of those differences down to a handful so that you can directly compare teams. Average game pieces scored for example is an excellent metric because it correlates so well with future wins. If you can’t turn the abstraction of your scouting back into successful match predictions, your scouting system is useless, no matter how well you recorded the data. That’s what I mean by becoming dissociated with reality, because you are assigning meaning to numbers that don’t actually have meaning in the real world.

Having reliable raw data is certainly important though, and there are plenty of ways to validate that (API, re-watching videos, comparing your data to other teams’ data, etc…). I can’t think of any other reasonable way to evaluate how **meaningful** your data are though other than with predictions.

Good discussion though for sure

I agree 100% with what you said.

My point is for the average team, I’d be more concerned with validating raw data, for two reasons. First, most teams collect basic match data (number of game pieces scored, climb or not). Secondly, as you stated, average game pieces scored, calculated from that basic match data, is an excellent metric and one that probably doesn’t need to be proven meaningful. If elite teams want to go above and beyond and have advanced stats or formulas, then I agree with validating that specific data to see if it is valuable or meaningful.

I don’t want newer teams thinking they their scouting systems must predict X% of matches right or the system is useless. If you’re collecting basic match data, I think it’s fair to assume that data is meaningful and I’d focus on making sure it’s accurate.

(Kind of related, but I was listening to a podcast that was talking about all the different metrics to quantify QB performance (QBR, QB Rating, PFF Grade, yards per attempt, EPA…) and how poorly QB Rating does at predicting team win-loss compared to the other metrics yet common it is used during tv broadcasts. I think this speaks to @Caleb_Sykes point of how important is it to actually validate that the metrics are actually meaningful and predict future events or if they are just a number that doesn’t really help.)

We used to do some very rudimentary match predictions, but it was pretty useful.

We used our scouting data to generate an average points per robot, per match. Sometimes we would credit points for actions that didn’t lead directly to points. Ex) In 2014 we credited points for missed shots, because a non-accurate shooting robot could be reassigned to a truss robot. We also weighted more recent matches slightly higher. Nothing special.

Then we’d take the average robot scores and apply it to future matches to come up with a predicted score. We would divide all win/loss spreads by the highest match’s win/loss spread to determine a pseudo win %. Again, super simple, but it worked for us.

We could use this to “predict” the final rankings as the competition went on. This was before the extra RP era, so things were a little simpler. I’m not sure how much benefit we really got from the predicted rankings, but it was always fun to follow.

The biggest benefit this gave us was updating our picklist on Saturdays. We didn’t collect match data on Saturday, we relied on ad-hoc scouting, and our “dynamic scouts” following overall match flow. If we predicted blue to win a match “90%” but red won the match, it flagged us that something that really mattered happened. Either some robot on red over performed, or a blue robot under performed. This would cue those scouts that their observations for that match were important, and we might adjust the pick list accordingly.

None of this took more than basic math and Excel skills. As an aside, I’d be really interested in seeing a match prediction accuracy comparison between a simple setup like this vs something more advanced. If anyone has something like this please link it. My gut tells me the simple approach can get 80-90% of a more advanced approach, but that is based on absolutely nothing. I’m happy to be proven wrong!

These have all been really informative answers, thank you all! @Caleb_Sykes I really appreciate your in depth answer to all of it. I think this

And this

Is where we see the most value for us. We really want to make sure that the data we *think* is valuable is actually a contributing factor to the outcome of a match, which will in turn better inform our match strategy and pick list decisions. Finding incorrect predictions seems like it would be of huge value to the match strategy group, where they can review these matches and see if there was something done exceptionally well by a team that impacted the outcome, and should be noted.

Quick question about this formula

Are you suggesting the rating for the teams be the sum of all the points they have scored? How did you determine that 30 is a good baseline, and how do you know if you need to adjust the scaling factor?

I think if you are just starting out on match prediction, a good starting point is a team’s average points scored from their prior matches. This is something I see many teams already calculating anyway, it is easily interpretable, and your predictions can be compared to future match results (scores) very easily. Like I said, anything can work for a rating in theory (team age, bumper quality, Blue Banner Quotient, etc…) but some ratings will have much stronger predictive power than others.

How did you determine that 30 is a good baseline

When I built a year-to-year OPR match prediction model, I found that 2.0 X (standard deviation of week 1 scores) to be the best scale factor (that is, the number that maximized the predictive power of my model). In 2019, the standard deviation of week 1 scores was 17.1, so a good scale factor s for using OPR would be 34.2. Since I would expect your scouting data to provide a fair bit more predictive power than OPR, a scale factor of 30 is my standard recommendation this season. My recommendation will change next year though as 2020 points mean something different than 2019 points. If you want to calculate a baseline yourself next season, just grab a bunch of matches from week 1 and determine the score standard deviation, and multiply that by 1.8ish.

Note that 30 is only a good baseline if your team ratings have units of 2019 points, if you use any other rating you’re on your own for finding a baseline.

and how do you know if you need to adjust the scaling factor?

The scale factor tells you how confident your predictions will be based on the ratings. A low scale factor will make more aggressive predictions, and a high scale factor will make more conservative predictions. You’ll have to play around with it to find what works best for your data. If all/most of your predictions are hanging around 40-60%, that probably means your scale factor is too high, so you can lower it to get some more aggressive predictions. If on the other hand, you are getting way too many predictions at <10% or >90%, particularly if the underdog is frequently winning, that probably means your scale factor is too low so you should raise it to get more conservative predictions. The best thing to do is to score all of your predictions using Brier scores or log loss systems and choose a scale factor and rating system that minimizes these scores.

Hope that helps, let me know if you have any other questions!

Personally, I use Match Predictions to validate my scouting system. My team uses it to help prepare more for matches but my team doesn’t put much stock in it past that. I do it simply because I enjoy predicting and coding sheets.

match predictions is pretty new to me. Is it a set template that you just plug and go? I understand it being measured from OPR, but ELO sometimes? Maybe another thread for this?

From earlier in the thread

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.