Matchresult and ranking prediction contests are a fun way to learn about the interesting methods the FRC community has come up with for predicting match results, helpful for finding close or blowout matches, and an opportunity for playful banter. I’d love to see prediction contests for more than just a small handful of events each year, so I’m thinking about building an automated FRC prediction competition website.
The general idea is that people would submit predictions to the website via an API on a per-event basis. Predictions for a given event could be submitted and updated as many times as desired, but updated predictions for matches that have already started would be ignored. All predictions and some aggregate of them would be displayed publicly for each match. A stats page could show how accurate everyone’s predictions are.
Some nice things we get from this setup:
People no longer have to manually deal with spreadsheets (thanks, everyone who has organized a prediction contest!)
Algorithms that adapt their prediction throughout the course of an event can be better showcased
The FRC community can more easily browse predictions compared to viewing a spreadsheet
My question for you: Would you either be interested in participating in making predictions or viewing predictions made by the community?
If so, there are still a few things to be decided. Here are a few big ones:
Prediction API granularity: Do we care just about win/loss? Probability of win/loss? Probability of getting bonus RP? Final match score? Probability distribution of final match score? Final match subscores (e.g. auto score & teleop scores separately)? Final game state (e.g. everything that shows up in the “Detailed Results” table on TBA](https://www.thebluealliance.com/match/2018casj_qm1))? The list can go on and on…
Do we want to including ranking predictions? If so, we again need to decide on the granularity of data (e.g. average rank vs. rank probability distribution).
What stats should we show to rank how accurate predictions are? Since it’s not required to submit predictions for every event, we may need separate stats that rank by the whole season vs. single event.
It would make everything a bit more complicated, but in theory you could have all of the prediction granularity options at once. If someone only submits win/loss predictions, they are only “competing” in the win/loss rankings.
If they submit win/loss probabilities, they can be entered for that ranking’s “competition” and the system can calculate the implicit win/loss prediction based on the probabilities and enter them in that “competition” also. And so on for all of the categories, where your prediction counts for the “competition” you entered in, as well as all of the less-granular ones.
This way, people with really complex prediction schemes can show off their granularity*, while beginners just trying to predict who will win also get a shot.
*not sure this is the right word here…you get what I mean though
ALSO: a low-friction way to submit human-generated predictions would be appreciated. Bonus points for mobile-friendly, so as to work in the stands or at the hotel during an event. Maybe a web interface attracts more guessers than data scientists, but I think there’s value in that once you get enough people weighing in.
That’s a good idea. If we go down this route, we still need a reasonable cutoff for granularity. I highly doubt that anyone will actually do final game state predictions, so implementing that would be a waste of time. I’m thinking win/loss, prob win/loss, and prob bonus RP is sufficient, at least for the first year we try this. We can get fancier in the future.
The interface would be straightforward and would call the same backend API. Have to think about how this scales. I’m sure we can handle a few dozen “data scientists,” but accepting predictions from hundreds (thousands?) of students might get very $$$ very quick.
I will post more thoughts eventually, but I found having an easy to use web interface helps a lot with engaging participants. The invite prediction contest had a Google form where participants could just vote yes or no on each team. 75 people entered, many of whom didn’t seem to be especially involved in the FRC data scene.
I think an important question to ask is, what is the goal here: engage new/casual players exploring predictions and data, or optimize for power users (@SLFF crew and Caleb)?
This is a neat idea, would be cool to have a site used for pickem or fantasy leagues beyond ChiefDelphi or custom spreadsheets. Not sure if this is quite what you are looking to do but I could see it evolving into that.
I didn’t really consider having a bunch of people casually vote on who they think will win until the idea was brought up in this thread. My original intended goal was to optimize for power users in terms of prediction submission and make viewing predictions accessible to everyone. That way, we can have high quality predictions (unlike the TBA predictions page I never had time to properly update for 2018) that are more accessible than Caleb’s Event Simulator (which is awesome), while automating the spreadsheet-based prediction contests.
I definitely do not intend for this to become a fantasy league platform for FRC.
I would say WL and probability of WL for sure, and maybe the other RPs as well. Anything more than that seems like overkill at this point, although I could see adding more things in the future.
[li]Do we want to including ranking predictions? If so, we again need to decide on the granularity of data (e.g. average rank vs. rank probability distribution).
I’d say no, keep it simple in year 0 and see where people want to see things go from there. I didn’t get around to ranking projections in my simulator until after I had spent a ton of time on other things. We’ll likely find other uses for this along the way, so no need to force anything this big so early.
[li]What stats should we show to rank how accurate predictions are? Since it’s not required to submit predictions for every event, we may need separate stats that rank by the whole season vs. single event.
For each event and for the season, I personally would like Brier scores, number of matches predicted, and maybe some measure of over-confidence (either a single number or a calibration curve) if you have a big enough sample size. Probably also good to split quals and playoff matches. I’m not a big fan of “matches correctly predicted”, but I get that this is more intuitive than Brier scores, so I guess putting it in is fine so long as there’s a big enough sample size.
You could also show Brier scores only for matches that were predicted, but it might be better to default any non-entries to a prediction of 50%. That way no one can just predict the obvious matches and get an artificially good Brier score. If you want to get a single score to evaluate who had the “best” predictions at an event or set of events, I would recommend a Brier score with a 50% prediction inserted for any matches not predicted.
I’m pretty excited about this. I know my excel workbooks can sometimes be cumbersome to deal with, so this would be an awesome way to better share my work.
Is there a less-resource-intensive way to collect casual votes and present them as one “The Internet” prediction model? I don’t know if that gives people all of their jollies, but it would be a good chunk of them. (There may not be such a way, I’m spitballing.)
I think for the win probability predictions you should use log loss. It penalizes over confidence a logical amount, when compared to something like error squared (Otherwise known as Brier scores).
Using 50% default predictions seems like a good idea. In the scope of a competition anyway.
I also think “matches correctly predicted” makes a lot of sense for W/L predictions and season long scores because it encourages people to predict more events.
Yeah, I agree with keeping things simple the first year and seeing where it goes.
Great to hear you’re on board! It’s not a prediction competition if Caleb isn’t participating and beating everyone.
That’s a nifty idea. By aggregating casual votes into one single model, we can present the data concisely without casual voters/guesses overwhelming the “data scientists.” We can also extract probability WL by looking at what fraction of users vote for red vs blue.
I’m not sure what you mean by “integration.” If you’re asking if people might use this data to write articles for the Blog, then sure.