Machine Learning for Match Prediction

As a summer project, I wanted to get more into machine learning, so I’ve put together something in Python that predicts the winner of a match.

The way it works right now is that it downloads data from The Blue Alliance. It trains and tests on historical data in order to hopefully get an accurate prediction for matches that have not yet happened. For any given match, it calculates the average match score prior to that match occurring for every team on each alliance. It does the same for other attributes like average fuel and rotor in auto and teleop.

This data is fed into the machine learning classifier (I seem to get the best results with logistic regression) and it learns to predict what alliance will win. Just as an example, the model currently predicts that the 973 alliance will win Festival of Champions, but the problem I have is that when testing on historical data, it seems to only get around 65% max accuracy.

While this is ok, it isn’t amazing. I was wondering if anyone has tried to do this before or had some ideas on how to improve this accuracy? Just to be clear, I don’t see the benefit of predicting match results, but I think would be a pretty interesting alliance selection tool.

Thanks!

Check out The Blue Alliance’s match and rankings predictions. For example, the predictions from this year’s Tech Valley Regional are available here. And if you poke around for a bit, you’ll note that for this event, only 63% of matches were predicted correctly, which isn’t far from where your model is. Eugene, feel free to correct me if I’m wrong, but my understanding is that the model that TBA uses involves calculating Gaussian distributions to predict the points each alliance member will contribute – so there’s not a neural network going on or anything like that.

Steamworks is a really, really hard game to predict. Games like Rebound Rumble, Ultimate Ascent, and Stronghold were much easier (see Karthik’s 2014 seminar, my paper on OPR in 2013, etc.). My personal belief is that these games were easier to predict because scoring was more linear and more separable (meaning that robots independently contributed to scores; the amount of boulders you scored in 2016 didn’t often affect your partner’s ability to score).

I don’t see the benefit of predicting match results, but I think would be a pretty interesting alliance selection tool.

I disagree with you on this point. Predicting matches most years is vital to predicting rankings - and, statistically speaking, event winners tend to come from the lower number seeds. Additionally, if you know that you have an impossible match headed your way, your alliance might agree to focus on secondary objectives that help out with the sorts after wins and losses.

Qualification matches are definitely nontrivial.

In my experience, the main problem with predicting match scores is that the data that is available is very limited - unless you are using scouting data, you just have match score and team numbers. You can get better accuracy if you have per-robot scouting data, but then you probably have a much smaller dataset. Also keep in mind that 2017 is a really difficult game to predict scores for, due to the nonlinear scoring and insane climb bonus. If you want an easy game to play around with, 2015 had very, very, predictable scores, so it might be a good way to test different approaches to this.

With regards to your prediction about the FoC result I’d be concerned about the dataset that you have - None of the teams that are on opposing alliances played against each other during the season, so I would not expect any approach to work well based on match scores alone.

I’m curious what your approach was for determining that you got [strike]63%[/strike] 65% accuracy also - I assume that you used separate testing and training data - what percentage was what, and how did you determine the split?

Anyway, sorry I don’t have any more useful suggestions - I do think that what you’re doing is cool, and I’m always interested in seeing ML applied to FRC :slight_smile:

I think there’s a bit of a misunderstanding here. The TBA match predictions from TVR were 63%; the OP is claiming 65%.

That is a good point. I didn’t think about it like that. Nice to know there is also a strategic way to implement the match predictions. Also, I didn’t realize how hard Steamworks was to predict compared to some of the past games. I’m curious to find out how next year’s game will fare with match predicitions.

I actually didn’t even realize that TBA had match predictions. It’s also nice to know that they are achieving about the same accuracy despite the different method of predicting. It’ll be nice to look into to see what they are doing though. Thanks for pointing it out!

Definitely agree with the scouting data, but I wanted to see how well match prediction could work with just using TBA data. Good point with the climbs. Again, I didn’t realize how hard 2017 was to predict. Also, thanks for pointing out 2015. It’ll be nice to play around with that data to see how well match prediction can work with it. Just to clarify though, the model I’m working on now doesn’t predict match scores, only whether an alliance will win or not.

The nice thing about this model is that it should not matter if a team has played another or not. The teams themselves are completely abstracted before passing it in to the classifier. It only trains on match attributes such as rotor performance and fuel performance. That being said, the model only has 65% accuracy, so the 2767 alliance has a pretty high 35% chance of winning according to the model. There could also be problems with the model that I have not seen yet as well.

The way I determined the 65% accuracy was through Kfolding Cross Validation. It splits all the data into 10 separate sets and trains on splits 1-9 and then tests on split 10. After, it uses splits 2-10 to train and tests on 1, then uses 1, 3-10 to train and then tests on 2 and so on until every split has been used to test. The accuracy of each test is averaged to determine the average accuracy.

Thanks for the insights! I’ll look into 2015 and try it out to see how well it works. Maybe 2018 will be easier to predict. :slight_smile:

The problem with looking at a prediction accuracy like “63%” in isolation is that we have no idea how accurate a predictor can be, and so the number doesn’t really tell us how good the algorithm is at identifying the “better” alliance.

I think the closest you’ll get to gaining insight on this is to run Monte Carlo simulations of tournaments with robots of known “underlying ability,” and seeing how often the “better team” wins. You could also see how often analysis with your machine learning algorithm correctly identifies the “better” alliance. Of course, there are still a lot of open questions even with this approach - for example, how much variance is there in the by-match performance of a typical robot? - which probably need to be answered by looking at detailed scouting data (i.e., actually tabulated values of gears/climbs/fuel per robot per match - I’m sure teams would be willing to share their data, if one were to ask).

Do mind clarifying what you mean by “in isolation”? To clarify, the model trains and tests on actual matches (It trains on different data from its testing data).

Correct me if i’m wrong, but the problem I see with this type of method is that the model will only learn to detect extremes, when the majority of matches are not. I do agree that there is a lot of things not taken into account like the variance in per match performance. This is part of the reason for the low accuracy of the model. It would be nice to see how a similar model would work with scouting data though. We’re a pretty small team so we weren’t able to collect a lot of accurate data. Thanks for the input!

In addition to raw % accuracy, it’s also good to consider Brier scores, which also factor in prediction confidence. 0 is perfect, 0.25 is random guessing, and 1 is perfectly incorrect. in 2016 I was able to get Brier scores of ~0.17, but 2017 was worse at ~0.23. Caleb Sykes was able to get ~0.20 in 2017 and ~0.17 in 2016.

Edit: In addition to predicting win/loss, it’s also valuable in games like 2016/2017 to predict objectives, such as breach/capture and 4 rotor/40 kPa.

Remember FIRST publishes more data than just match scores, they do publish (in 2017) how many balls went into high/low goals, how many rotors spun, and how many climbs there were. Now, this data is per alliance (you could maybe guess that robot 1 auto was correct, and they climbed the left touchpad, but robots started in weird auto positions and I wouldn’t trust it), so trying to extract out what robot did what activity is difficult. (I know, I wrote a program to do that this year. The basics were to run the OPR calculation aka linear approximation, but instead of final score, you plug in climb score or ball scored in their matches. Then you can get negative values and everything else and the data was not very consistent.)

Predictions based on robot capabilities (like predicting how many shots they make, gears run, etc) would probably give you more accurate results, until you realize team play to win, not play to their averages, and will make strategies per match on how to win instead of score a bunch of points. This is why I think OPR is worthless, as teams play to win, not play to maximize scoring.

Also, predicting the FoC is hard, as both alliances could pretty consistently score 4 rotors and 40 kPa. It is going to come down to something harder to predict, like amount of defense played, strategy, or who just got more fuel made. (I give the St Louis teams the advantage as they did get 40 kPa more, plus set the high score in qualifying with 101 kPa!?! in one match in Daly qualification)

marysville.csv (10.2 KB)


marysville.csv (10.2 KB)

2013 is a pretty good game to predict. Scores were linear and their were not really any dual task bonuses like capping, breaching, capturing, and assists. For 90% of robots, how many points a robot contributes did not have a significant reliance on who their partners or opponents were. Other 10% was full court shooters and floor pickups which could complicate things. We might not have another really straightforward game like 2013 but I bet you would have more consistent results from predicting that year.

So I have been kind of toying around with a slightly different problem that may be relevant to this discussion. I was trying to figure out if I could come up with a single metric that could rank robots the most accurate.

My methodology was :

  1. create a set of robots with random set of attributes (run X gears, auto X percentage of the time, climb Y percentage of the time, etc)
  2. feed them into a real tournament schedule.
  3. take the results and order the teams by the metric under test.
  4. Create alliances by taking sets of 3 robots in order (so 1-3 is alliance 1, 4-6 is alliance 2, 7-9 is alliance 3, etc)
  5. play 20,000 matches between consecutive alliances (1 plays 2, 2 plays 3, 3 plays 4, etc)
  6. the metric is rated on whether it can stay monotonic (1 beats 2 more often, 2 beats 3 more often, etc).

Key thing to note - step 2 uses qualification scoring and step 5 uses elimination scoring.
I am also using a 25% +1 and 25% -1 on the gears run randomization.
I was ignoring fuel and mobility for the first round.

Some things specific to Steamworks I have noticed:

  1. A lot of the possible alliances end up with a less than 65% chance of winning a given match
  2. The trade off between more gears and a higher climbing percentage is partner dependent (a 4 gear robot with a 20% climb is sometimes better and sometimes worse than a 1 gear robot with an 80% climb depending on the other two robots)
  3. Past results aren’t necessarily predictive - it is not uncommon for 40% climb robot to have more climbs than a 60% climb robot over 8 to 12 matches
  4. Touchpad points are a poor indicator of actual climbs - when doing the transformation of real climb percentage -> touchpad points -> subtract average alliance partner climbs -> estimated climb percentage the error can be as high as 40% due to luck in how often partners climb
  5. Rotor points isn’t great predicting gear cycles - 4 rotor bonus tends to be better

2 really hurts the idea of coming up with a single ranking metric.
4 really hurts trying to look at data on the blue alliance and making a robot profile - it is always going to struggle trying to determine if a 450 touch pad point team can’t climb or had partners that didn’t climb in their matches.

So just for fun, I tried to create a predictor with machine learning this year. The approach I used was similar to one that I had used for a data competition last year. My model was three layers.

The first layer consists of 30 models, where their predictions were fed into the second level as meta features. I also engineered 10 additional features to be used.

The second layer consists of three layers(Xgboost, an NN, and Adaboost).

The third layer is just a weighted mean of the predictions from the second level.

The models for the first layer were an assortment of Regression learners with scikit and Xgboost, along with Lasagne, KNN, and Sofia.

My raw data came from TBA, with some additional stuff coming from my team’s scouting data(defense rating, fuel shot, gears put up, etc). Overall I was able to reach 78% accuracy in predicting win-rate, however, the main issue was penalties. If I factored out all the matches that were won due to penalties, I could reach about 87% accuracy. The nonlinear scoring this year made it a real challenge to predict.

One datapoint people may find interesting…

Across the 12 championship division elimination rounds 34 out of 84 alliance match ups went to 3 matches. That means at that level using the winner of the first match was only able to predict the winner of the second match 60% of the time.

(Note - I did not filter situations where an alliance swapped a robot).

Did you keep your training and testing data separate?
How many events/matches did you use?
Did your predictions exclusively use TBA information/scouting data from matches that occurred earlier in time?
What is your method for determining your system’s overall accuracy (For example, how are ties handled)?

For some reason, my replies haven’t been posting… Anyways just to sum the replies up, the way that I tested accuracy was through Kfolding Cross Validation with 10 splits. It trains on 9 splits and then tests on the last one. For example, it trains on 1-9 and then tests on 10. After, it trains on 9-10 and tests on 1 and so on until every split has been tested on. Also, the training should work on all matches even if teams have not played against each other as the teams themselves are abstracted from the model.

In regards to The Blue Alliance, I didn’t even know they had match predictions. Is there any reason why they are a bit hidden? I guess I should of come across them when looking at the API, but I kind of just skimmed through it. Interesting how they have very similar results even though a pretty different method was used. I’ll definitely look into it along with 2015 and 2013 data. I’ll also look into brier scores and the other methods you guys mentioned. They seem interesting. Had no idea 2017 was this hard to predict. Thanks for the help!

Predictions are exposed via the

/event/{event_key}/predictions

endpoint in APIv3, per the docs. If I understand correctly, they are hidden on the website because of their level of accuracy - the 60% range just wasn’t worth advertising. Eugene or Phil would be able to provide an authoritative answer.