paper: Caleb's Event Simulator 2018

Working on it.

Past Caleb didn’t do a very good job commenting his code or even bothering to structure it in an understandable way, which makes it difficult for me to adapt it for 2018.

I’m glad someone cares though, that’s definitely a motivational boost.

Yeah, it’s not happening tonight. Sorry to anyone who was looking forward to it. I’ll take another crack at it tomorrow.

I am also interested in it - it has seemed to be a surprisingly effective means of predicting rankings / match outcomes! Definitely better than making educated guesses match-by-match! That said, I really only use the event simulator for events at which my own team is competing… I have a hard enough time just keeping up with watching / checking rankings for other events.

Well, I uploaded a 5.2 version that includes ranking projections. It has almost no model tuning and is probably filled with bugs. The extent of my testing was that the ranking projections looked approximately correct for mnmi.

If you find bugs, feel free to let me know. Use it at your own risk.

I just fixed a major bug that caused red to not get credit for their projected wins. Up to 5.3 now.

About 4 weeks overdue, I finally went back and found how predictive various methods were of predicting matches. Here are the results for week 4 competitions excluding Israel champs. Values are Brier scores, lower Brier scores mean higher predictive power.

As expected, my Elo model takes a slight but appreciable edge over a simple calculated contribution to total points (OPR) model. I still believe it would be possible to improve upon this calculated contributions model to beat my Elo model, but I have yet to spend sufficient time developing anything that could prove or disprove this theory.

The most powerful predictor we have remains the simple average between calculated contributions and Elo, so this is what I will be using for match predictions moving forward (I’ve been using Elo only so far just because I hadn’t done a proper analysis until now).

To the surprise of no one who follows my work, CCWM predictions are awful as usual relative to the other methods, but I thought I’d throw them in anyway since someone was bound to ask about them.

In comparison to other years, using these methods we have slightly less predictive power than 2010, and slightly more predictive power than 2016.

I’ll make a calibration curve for Elo sometime soon, but I’m confident it’s well calibrated.

Here is the calibration curve for Elo for all matches through week 5. It’s a pleasing little graph, maybe even avatar-worthy :stuck_out_tongue: . There is just a touch of under-confidence in it, but nothing concerning, I’ll probably look more into the under-confidence once the season is over.

The predictive power of Elo this year so far is greater than that of Elo in any year on record except 2011, and the playoff predictions this year are the best on record (since 2005). Although to be fair, the ovarall predictive power will probably drop after championships, which is generally the most difficult event to predict due to regional variations in team strength and how the game is played.

Total Brier score is 0.1750, quals Brier score is 0.1774, playoffs Brier score is 0.1634, 73.34% of matches correctly predicted.

Just uploaded v6.1. Key changes are:
Various bug fixes
Added calculated contribution predictions
Updated seed values
Updated Instructions and FAQ

I still haven’t done as thorough of a testing as I would like, so I bet there are still bugs. If anything looks off, let me know. Match win loss predictions should be improved since I’m not only using Elo now. The other ranking points predictions are still probably poor since I haven’t had time to calibrate their parameters.

I have posted an update which can be used to simulate the Houston divisions based on the preliminary schedules. Huge thanks to Wes Jordan for getting all of the schedules into the same format as the TBA data. Basically the only change to this workbook is that I reference his site instead of TBA.

This workbook will only work with the preliminary schedules on Wes Jordan’s site. You will have to switch back to one of my normal simulators when the actual schedules and match results start to be posted.

You must spread some Reputation around before giving it to Caleb Sykes again.

I’m excited to take a look! Thanks for putting this together.

I had a bug that caused ranking predictions for teams in surrogate matches to be calculated incorrectly. This should be fixed in the new v7.3.

I have posted an update that uses the preliminary schedules for Detroit. Let me know if you notice any issues. Again, very big thanks to Wes Jordan for getting the schedules into a useful format for me.

Same disclaimer as before:

This workbook will only work with the preliminary schedules on Wes Jordan’s site. You will have to switch back to one of my normal simulators when the actual schedules and match results start to be posted.

I have posted a version 10.1 which can be used for off season events. I got rid of the expiration warning on it even though there is a good chance I will be posting at least one more update in a month or two. No promises though. Subscribe to the thread if you don’t want to miss an update.

I have a bunch of changes for this version, some of which may cause bugs, so if you see any please let me know:

  • “Update” now checks if the number of simulations to run is the same before running, and allows re-importing if a different number of simulations are selected
  • Added a “settings” sheet from which you can select the number of simulations to run and whether or not to stop the “Update” macro if no new data is available.
  • Added an “Import Event Keys” macro to the “event keys” tab to look up off season events which haven’t been posted as of the publishing of this book
  • Changed seed values to use the seed values each team had going into their event instead of just their most recent seed value. This has no event on off season competitions, but will correctly show teams’ seed values for historical events.
  • Added “average rank” to predicted rankings sheet and this is now the default team sort
  • Added ability to “simulate” from any past point in the event. This can be used for example to find what win probabilities would have been predicted by the simulator before the matches actually happened.

Here’s another update. I wanted to get it out before IRI. This is probably the most changes I’ve made in an update this year, although many of the changes are behind the scenes. Here are the key changes:

  • Added conditional formatting to predicted ranks
  • Uses park/climb points as second sort ranking criteria instead of total points, added auto points as third order ranking sort
  • Rankings are now generated according to match data instead of being directly pulled from TBA. This allows you to see what the rankings were at different points throughout the event.
  • various code optimizations, re-structuring and cleanup
  • Various bug fixes
  • Shows predicted contributions before event starts by copying seed values
  • Now allows simulations of events which do not have their schedules released yet

When the IRI teams list shows up on TBA, you’ll be able to make predicted rankings even without a schedule. Likewise, you can see what each team’s predicted rankings would have been before a schedule was created for completed events. With this, I’m finally comfortable jumping more into the “strength of schedule” conversations. All we have to do to tell if a team has a good or a bad schedule is to compare their pre-schedule ranking probabilities with their ranking probabilites after the schedule is released. We would still have to agree on a way to combine the full ranking data change into a single metric (or just accept that no great single summary statistic exists), but at least now each team’s strength of schedule can be seen in a personalized way for them. I never cared much for strength of schedule metrics that are basically some variation of subtracting opponent strength from partner strength, because how “good” or “bad” you feel your schedule is depends heavily on your expectations going into the event based on how good your team is. A schedule with 5 guaranteed wins and 5 guaranteed losses is clearly awful for the team looking to seed first, and clearly awesome for the team worried about seeding last.

Here is each team’s change in ranking expectations from before the schedule is released to after the schedule is released for Medtronic:

team	avg rank change	1 seed	Top 4	Top 8	Top 12	Top 15
233	-7.8		+0.0%	+9.2%	+21.7%	+24.9%	+25.7%
1816	+12.6		-0.1%	-3.1%	-11.3%	-20.4%	-26.6%
2052	+0.3		-3.1%	-5.1%	-0.4%	-0.9%	-0.4%
2181	-4.1		+0.1%	+8.3%	+12.3%	+13.8%	+14.6%
2232	+9.9		+0.0%	-1.3%	-4.7%	-9.8%	-13.5%
2239	+11.8		-0.1%	-1.1%	-5.3%	-9.9%	-13.7%
2450	+4.7		+0.0%	-0.6%	-6.9%	-9.9%	-12.4%
2470	+0.9		+0.0%	+0.1%	-0.2%	-1.6%	-2.2%
2480	-9.0		+0.0%	+0.4%	+1.4%	+2.8%	+4.0%
2498	-0.4		+0.0%	-1.2%	-2.6%	-5.0%	-4.6%
2500	+1.2		+0.0%	-0.6%	-1.2%	-3.2%	-4.3%
2501	-4.7		-0.1%	+10.3%	+17.8%	+17.4%	+17.2%
2502	+0.6		-2.7%	-4.9%	-7.7%	-3.0%	-0.7%
2508	+8.0		+0.0%	-0.6%	-4.4%	-10.1%	-12.5%
2509	-11.1		-0.1%	+13.8%	+23.5%	+32.4%	+35.5%
2513	-6.1		+0.0%	+0.1%	+0.9%	+3.2%	+5.2%
2515	-0.1		+0.0%	-0.1%	-0.4%	-1.2%	-1.1%
2518	-4.8		+0.0%	-0.4%	-0.1%	+0.8%	+1.6%
2529	-7.9		+0.0%	+0.0%	+0.9%	+3.6%	+6.2%
2532	-1.0		+0.0%	-0.3%	-0.7%	-1.2%	-1.8%
2545	-1.9		+0.0%	-0.2%	-0.7%	-0.8%	-0.9%
2823	-11.6		+0.3%	+9.8%	+22.2%	+32.1%	+35.7%
2825	+3.8		+0.0%	-0.1%	-1.0%	-1.9%	-3.1%
2846	+10.9		+0.0%	-13.6%	-27.2%	-34.4%	-36.6%
2847	+3.2		+0.0%	+0.0%	-0.5%	-1.9%	-2.4%
2879	-7.6		+0.0%	+1.2%	+5.5%	+10.4%	+15.7%
3018	-8.6		+0.0%	+1.9%	+8.0%	+11.9%	+15.0%
3023	-10.4		+0.0%	+1.7%	+5.8%	+14.2%	+19.1%
3026	+0.8		-0.1%	-1.5%	-5.4%	-7.5%	-8.4%
3038	+5.6		+0.0%	-0.2%	-2.4%	-5.6%	-7.9%
3058	+7.1		-0.2%	-2.6%	-7.7%	-13.9%	-15.3%
3081	-7.7		+0.0%	+0.1%	+1.4%	+5.9%	+9.2%
3102	-2.8		+0.1%	-1.1%	-1.8%	-1.8%	-0.2%
3184	+0.6		-0.3%	-6.9%	-5.0%	-4.3%	-3.9%
3244	-13.3		+0.0%	+2.5%	+11.4%	+20.6%	+26.2%
3278	+11.6		+0.0%	-0.5%	-3.1%	-7.3%	-10.7%
3298	+6.6		+0.0%	-0.3%	-2.8%	-7.3%	-7.7%
3299	-11.8		+0.1%	+10.1%	+24.6%	+33.5%	+35.3%
3407	+13.8		+0.0%	-0.4%	-2.3%	-5.5%	-9.3%
3454	+0.1		+0.0%	-0.6%	-1.4%	-3.1%	-4.5%
3630	+6.4		-0.3%	-5.7%	-14.5%	-19.9%	-21.9%
3745	+0.1		+0.0%	-1.3%	-4.4%	-2.5%	-4.3%
3751	+11.3		+0.0%	-3.9%	-13.1%	-21.4%	-27.3%
3839	+6.8		+0.0%	-4.6%	-9.7%	-14.8%	-18.7%
3840	+9.2		+0.0%	-1.3%	-8.1%	-13.9%	-18.1%
4207	-2.0		+0.0%	+0.2%	-0.7%	-0.9%	+1.1%
4229	-2.9		+0.0%	-0.1%	-0.4%	-0.5%	-0.2%
4536	+3.4		+0.0%	-0.6%	-2.7%	-5.3%	-6.7%
4549	-13.5		+0.0%	+8.0%	+21.8%	+32.8%	+36.9%
4607	-0.5		+0.0%	-0.8%	-3.4%	-3.7%	-1.0%
4664	-4.9		+0.0%	-0.1%	+0.3%	+0.7%	+0.9%
5172	-0.2		+7.9%	+2.1%	+0.4%	+0.3%	+0.2%
5434	+2.8		-1.3%	-15.5%	-17.2%	-12.4%	-9.8%
5637	-17.2		+0.0%	+1.7%	+6.4%	+14.5%	+21.2%
5913	-2.0		+0.0%	+0.1%	-0.1%	-0.7%	-1.5%
5996	+0.3		+0.0%	+0.1%	-0.7%	-1.2%	-1.6%
6709	-7.0		+0.0%	+0.6%	+2.4%	+6.9%	+11.3%
7038	+2.8		+0.0%	-0.2%	-1.5%	-3.2%	-5.2%
7068	+4.0		-0.1%	-0.5%	-2.7%	-5.1%	-7.2%
7180	+7.6		+0.0%	-0.4%	-1.9%	-4.5%	-7.6%

I’ll probably make a full workbook that contains something like this for all 2018 events soon.

Huge thanks to Patrick Fairbank and the other developers of Cheesy Arena for creating these awesome pre-generated schedules. Adding pre-event rankings was already a large effort, I’m so grateful I could use these schedules instead of building my own scheduler from scratch.

You’ve been busy! As you know, I’ve been interested and experimenting with this Strength of Schedule stuff for some time. Thanks so much for all your hard work on it. This approach is a really nice solution to this difficult problem. It’s a great addition to your “I Can’t Live Without It” simulator.

So your “strength of schedule” metric just compares with one pre-generated schedule? That doesn’t seems like a good metric, as there is no guarantee that these pre-generated schedules are fair. It would make more sense to generate hundreds of potential schedules and compare the actual schedule against those.

EDIT:
Perhaps another solution is to keep the same pre-generated schedule, but randomize the order of the teams in it hundreds of times. That way you won’t need to create a whole new scheduling algorithm.

This is indeed what it currently does, every simulation randomizes the assignment of teams to the schedule indices.

I just uploaded a v4.

Changes:
Fixed a bug that didn’t reset surrogate assignment to the schedule when a new event is imported. Essentially, this meant that if you ran events with surrogates in different positions, teams would randomly have matches removed
Changed graph color in team lookup to green

This was a crazy bug fix, it took me a couple full days to track down. Running any single event was fine, but when I ran a different event after certain events, the ranking projections would be a little bit off for a few random teams. The affected teams would also change on each new simulation.

Should be all good now though.

Another update, primary purpose was to fix handling of DQs.

Updates:
Added bolding and strikethroughs to data import for surrogate and Dqed teams respectively
Covers some special cases for surrogate teams that were not covered previously
Now handles Dqed teams properly
Updated instructions and FAQ

I really hate dealing with surrogates and DQs, they’re such a pain. Hopefully I did a good enough job this time that I don’t have to think about them again.

Hey Caleb,

I’m wondering if there is some sort of substantial difference between these schedules and the schedules actually used in FRC. (Maybe you play more of the same teams over and over again in a real schedule?). Would you be able to use the schedule for some 40-team district event as a template and run your simulator on that? (Still with the random team assignments).

For example, compare the simulator probabilities for MAR Hatboro-Horsham with your current random schedule maker to a simulator using the actual schedule as a template.

I don’t expect there to be much of a difference, but it would be interesting to see that empirically.