Just gonna put this here… I love how passionate you are about all things statistical and FRC
I’ve also requested that the play count field be exposed in the API, so at least we’d at least be able to tell when a match has been replayed looking backwards.
Having a forward looking flag would be awesome as well (although this would be a larger scoped FMS feature) - I can submit an official ticket for this too
I’ve uploaded the preliminary schedules for Detroit to my “server”, so you can now use the Event Simulator with the Detroit prelim schedules. You’ll need to switch the “data source” in the “settings” tab to “arimb…” for it to work.
Alright, last required update of the season:
You should be able to use this for offseason events without issue. That said, no offseason events currently have team lists, so something might break when we actually get an offseason event with teams. In the meantime, feel free to use the custom event to see what things might look like at your upcoming offseason.
Added 5th and 95th percentiles to ranking projections
Added rp data to “predicted rankings” sheet
Added rp predictions to the “team lookup” sheet
Updated some of the fields in “instructions and FAQ”
Now caches all settings changes
Added a progress bar to continuous ranking projections, which also allows the continuous rankings operation to be cancelled
Added 4 new graphs to continuous ranking projections
Added a strength of schedule sheet
This was a big feature update that I was pushing to get out before IRI. Here is a more detailed explanation of the changes:
RPs: I had always internally tracked RPs in order to do the ranking projections, but since I had @R.C ask for them directly I realized that I may as well expose them. So I added them into the ranking projections sheet as well as team lookup.
Continuous Ranking Projections: I really liked the ranking probability graphs over time I previously had here, so I decided to make a few more cool graphs. One thing I wanted was the ability to see how the average, 5th/95th percentiles, and current ranks all looked for a team over time, so I added those. I also made similar graphs that look at RPs directly instead of ranks. Finally, like @lbl1731 suggested above, I added a graph that shows how a team’s strength metric can change over time. There’s a lot of cool things you can see in these graphs, and they should also provide more opportunities for people to verify the accuracy of my predictions. For example, here is 2052’s RP projections at MN State champs:
Notice anything off here? Well, if you look at matches 38 through 42, it gives 2052 a 0% chance of getting 17+ RPs, but they do eventually get 17 RPs. Why is that? Well, it’s a poor assumption in my simulator. 2052 goes into match 43 with 13 RPs, and my simulator gives them a 0% chance of getting the rocket RP. When they did end up getting the rocket RP, my simulator is caught off guard. 0% probabilities for theoretically possible outcomes are really dumb. I’m working on alternatives to my bonus RP predictions that don’t provide 0% or 100% predictions. Feel free to use these graphs to spot check my work. The predictions should never go to 0% if the result ends up being true, and likewise for 100% predictions ending up false.
The other much needed feature for the continuous ranking projections is a loading bar that allows process cancellation. Since this operation takes on the order of 15 minutes for me, I would often trap myself running this with no way to get out. Hopefully that won’t happen now, and it’s just nice to know roughly how far along the process is.
Strength of Schedule: Thanks to @bobcroucher for helping to get this one in. I’ve had a couple people ask about schedule strengths for this year, and although you can certainly calculate them without this tab, it’s pretty cumbersome. So I’ve added a new tab which can be used to calculate schedule strengths for any event that has the schedule released. My metric of choice is column E, but there are others as well. Also interesting to see how the projections change for the top 15 seeds before and after schedule releases.
As always, let me know if you see any bugs. Have fun at IRI! Once the schedule is out we can see who got lucky/unlucky
Continuous Ranking Projections weren’t working and now they should.
Another small bug fix. This should make it so the incomplete match data published before the detailed breakdowns doesn’t cause the ranking projections to go whack:
I have just updated to v10.1.
Here are the changes:
Added Iterative Logistic Strength metrics for the RPs
Changed the Ranking projections to use ILSs instead of Predicted contributions for the RPs
Changed the RP Probabilities in team lookup and match lookup to use ILSs instead of PCs
Because of these additions, this book no longer works with in-season events, only off-seasons. Use v9.4 for in-season events.
Read more about ILSs in my TBA Blog article here. I’ve been talking about a change like this for the bonus RP Projections for about a year, so I’m glad I finally got them implemented. I believe them to be both theoretically and experimentally superior to using predicted contributions.
I have just updated to v10.2.
Changes since v10.1:
Added overrides sheet
Added a status box that appears during data update which includes current process and warnings
Removed “Warn me when I am using an outdated version of the event simulator” setting from the settings tab
Deleted “team match array” sheet
Here’s my update for Chezy Champs. Biggest change is the addition of the ability to override my predictions to see hypothetical scenarios! I think it’s pretty cool and powerful. It’s still new though, so do your best to break it and give me bug reports :). I’m also open to UI changes on it or elsewhere in my book if anyone has suggestions.
I’m going to use 4607 at IRI as a case study to show some different ways you can use the override ability:
Okay, so, let’s pretend that it’s Friday night at IRI. 4607 is currently ranked ninth with 19 total RPs. Here are the default projections for their 3 remaining matches according to my simulator:
|Match||Complete Rocket RP Probability||HAB Docking RP Probability||Win Probability|
Their ranking probability distribution using these values looks like this:
So roughly we expect them to seed between 7 and 30. This is good info, but now with overrides, we can explore some hypotheticals to dive into the possibilities even more.
We’re going to work from the worst case scenario up to the best case scenario for 4607. So let’s start with the absolute worst case, that they get 0 RPs on Saturday. To set this, I just set their win probabilities and RP probabilities to 0% for each of their matches. Here’s their resulting distribution in that case:
Ouch, that’s pretty bad, they are expected to rank between 39th and 45th. Note that even though we have complete certainty about 4607’s RPs in this hypothetical, there are still unknowns for the remaining teams, which is why there is still a spread of possible ranks for 4607.
Okay, let’s get a little more optimistic. Let’s say that instead of getting a guaranteed 0 RPs, 4607 instead suffered some serious damage in their last match of Friday that they don’t think can be fixed on Saturday. To simulate this, we’re going to drop all of their win probabilities and RP probabilities down by 10%. Here is the resulting distribution:
Drastically better than the 0RP case, but not good compared to the default case. Their expected rank is now between roughly 9 and 34, and the average rank has shifted from around 18 to around 22.
Let’s flip that scenario though, what if 4607 just fixed an issue with their intake that will make them much better the next day all around. To simulate this, we’ll add 10% to all of the default probabilities. Here is that distribution:
Wow! Now their expected rank is between 6 and 23, with a hefty chunk in the top 15.
Here’s another good option for 4607, what if they win all 3 of their Sat matches, but the RP probabilities stay the same? Here’s what that would look like:
Nice, that would give them a pretty thin spread between ranks 5 and 14, and an all but guaranteed top 15 slot.
Finally, here’s the absolute best case for 4607, what if they get all 12 RPs on Sat? Here’s the result:
If they got all 12 RPs, they lock in a top 8 spot, with a highly probable top 4 slot, and even a sliver of hope for the 1 seed. Had they actually gotten 12 RPs, that would have set them at 31 RPs, which would have given them the third seed at IRI if all other results had stayed the same.
One last hypothetical, what if 4607 knew exactly which RPs they would win and lose the next day on Friday night? Setting the actual Sat results in the override section and simulating from qm 70 gives the following:
So 4607 could have landed anywhere from 17th to 23rd just based on the results from other teams (they actually ended up 20th).
Well that’s about it, I think overrides are super cool. If you want to enter my chezy champs ranking projection contest using the overrides (after the schedule comes out) feel free! I’m hoping this tool removes some of the overhead for entry into contests like this.
In the spreadsheet it lists “simulations to run”. Does that mean, for at least the pre-schedule release ranking distributions, you are considering x number of possible tournaments? If that is the case, how do you produce them?
Adding onto the point of simulating matches, are you evaluating the result of simulated matches with a certainty or a probability of a win/loss? If that’s the case, when compared to actual matches, how accurate does it tend to be? I produced something like this working with certainty not probability, based of actual scouting data, not blue alliance data, and got around 80% accuracy when compared to matches that actually occurred.
Also, this is much harder question, but if you have the number I’d appreciate it, how many possible schedules exist for a typical regional. I considered it earlier, but there are lot of algorithmic aspects which I had no idea of how to account for.
Hopefully you can understand my questions, I’d really appreciate a response. I’m trying to do some similar analysis, in terms of evaluating all possible schedules with scouting data and analyzing for example the chance the best team had of coming first, or the worst team getting in the top 16. I’m not really focusing on teams or anything, but actually how effective the game format is at ranking teams proportionally to how good the robots actually are.
Good questions! Let’s dive in.
Yes, I consider x unique schedules before the schedule is released. After the schedule is released I only consider that one schedule.
I copy the schedules from cheesy arena here. These schedules were created by 254 to mimic FMS-generated schedules and have very similar properties to the FMS schedules. I take these base schedules from there and, in each simulation, randomly assign teams to the indices in the base schedule.
So, for each match, I first calculate win probabilities. Then, in each simulation, I randomly assign each match as a win or a loss at the win probability rate I already determined. So, each simulation has certainty for the result, but when running hundreds of individual simulations and aggregating the results, I create a Monte Carlo distribution, which is what appears in the ranking probabilities.
So, my probabilities are all pretty well-calibrated, in the sense that, if you looked at 100 matches that I had all said red would win 80% of the time, red would win about 80 of them. If you just took the team I said had a higher probability of winning, that team would win about 72% of the time (depending on the year). Or alternatively, I get Brier scores around 0.18 (again year-dependant). I believe robust scouting systems can hit around 80%, which it sounds like you may have done (although you should make sure to predict at least a few events worth of data before you can be confident of this).
Oh geez. Well, we can set a lower bound of possible schedules at n! where n is the number of teams at the event. This is because any schedule generated could have the n teams shuffled in any order and still be a valid schedule. For a 50-team regional event, this is 10^64 possible schedules. As an upper bound, if each team has m matches, there will be m blocks of n! permutations of teams in addition to the possibility of surrogates, which I’ll just make their own block for simplicity. So at a maximum there are (n!)^(m+1) possible schedules. For a 50-team, 10 match per team event, that means there are no more than 10^715 possible schedules. So the actual number of possible schedules for a 50 team, 10 match/team event is somewhere between 10^64 and 10^715. My shot-in-the-dark estimate would be that there are around 10^300 schedules in this case, with error bars of 100 orders of magnitude either way. Whatever the exact number is, it’s easily high enough that you will never be able to test every schedule possible, so I would recommend looking into Monte Carlo simulations if I were you.
It may also be helpful for you to review the IdleLoop match schedule generation documentation, which walks through how schedules are created and what is prioritized in this process. Something to keep in mind is that not all schedules are actually equally likely, “better” schedules using the criteria in the IdleLoop documentation will occur more frequently than “worse” but still viable schedules.
I think I did, but if anything is unclear or you have followup questions feel free to reach out. Sounds like a fun project, I’d just caution that you should probably analyze across multiple events if you want results that are generally applicable.
Just updated to v10.3
Changed the name of the “predicted contributions” sheet to “metrics”
Added wins, losses, RPs, and total matches to “rankings” sheet
Miscellaneous small formatting changes
Fixed the “more completed rocket completion percentage” metric
I realized that my simulator/database was ill-equipped to handle questions like this, so I’ve added some more data to the rankings sheet, and I’m planning to copy the whole rankings sheet into my scouting database next season.
Probably not pivotal, but I also realized the “more completed rocket completion percentage” has been being calculated incorrectly the whole season. Essentially, I thought I was looking at points when I was actually looking at counts. The result is that this value was ~2.5X lower than it should have been and it weighted hatch panels more than cargo. Sorry to anyone that used that metric. I’m not super distraught though since it was one of the wonkier metrics and no one called me out on it all season.
Having trouble downloading 10.1 or 10.3, I get a 404 error. Was able to get 9.4.
Thanks very much for this, it is super cool and helpful!
Hi Bob, if you click the GitHub link at the top of the thread, you can get any version you want. I only provide direct links to the most recent versions in this thread, and the links go dead when I release a new version. The reason 10.1 and 10.3 don’t link is probably because I released 10.4 and forgot to provide a direct link, so here’s a link to 10.4.
Would it be a useful feature to anyone if I added an option to dump data into csv files? I don’t need that for anything right now, but if it’s something people would like I can do it. I imagine that might make it easier to interface with other applications, but not sure if anyone does that.
I think that would be useful to some of us. I like to fiddle with data using Excel.
Alright, well, I’ll throw it on my todo list.
Could you make them data tables? We could just link to the data instead of using a csv export step.
CSV dumps would make it much easier to manipulate data from Python and other programming languages