Go to Post It means Dave Lavery isn't on the Game Design Committee this year. If he were, you probably would have found a banana instead. - Alan Anderson [more]
Home
Go Back   Chief Delphi > CD-Media > White Papers
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

photos

papers

everything



Miscellaneous Statistics Projects

Caleb Sykes

By: Caleb Sykes
New: 07-16-2017 12:00 AM
Updated: 01-07-2018 10:11 PM
Total downloads: 1025 times


A collection of small projects that will be explained in the associated thread.

I frequently work on small projects that I don't believe merit entire threads on their own, so I have decided to upload them here and make a post about them in an existing thread. I also generally want my whitepapers to have instructions sheets so that anyone can pick them up and understand them. However, I don't want to bother with this for my smaller projects.

Attached Files

  • lsx IRI seeding projections investigator

    IRI seeding projections.xlsx

    downloaddownload file

    uploaded: 07-16-2017 12:00 AM
    filetype: lsx
    filesize: 5.13MB
    downloads: 102


  • lsx Elo and OPR comparison.xlsx

    Elo and OPR comparison.xlsx

    downloaddownload file

    uploaded: 07-24-2017 08:57 PM
    filetype: lsx
    filesize: 2.84MB
    downloads: 67


  • lsm 2017 Chairman's predictions.xlsm

    2017 Chairman's predictions.xlsm

    downloaddownload file

    uploaded: 07-28-2017 07:28 PM
    filetype: lsm
    filesize: 738.37kb
    downloads: 91


  • lsm 2018_Chairman's_predictions.xlsm

    2018_Chairman's_predictions.xlsm

    downloaddownload file

    uploaded: 07-29-2017 08:32 PM
    filetype: lsm
    filesize: 266.15kb
    downloads: 148


  • lsx Historical_mCA

    Historical mCA.xlsx

    downloaddownload file

    uploaded: 08-04-2017 02:17 PM
    filetype: lsx
    filesize: 428.25kb
    downloads: 65


  • lsx Greatest Upsets

    Greatest upsets.xlsx

    downloaddownload file

    uploaded: 08-20-2017 05:36 PM
    filetype: lsx
    filesize: 6.12MB
    downloads: 90


  • lsx surrogate results

    surrogate_results.xlsx

    downloaddownload file

    uploaded: 09-26-2017 04:47 PM
    filetype: lsx
    filesize: 53kb
    downloads: 27


  • lsx 2017 rest penalties

    2017 rest penalties.xlsx

    downloaddownload file

    uploaded: 10-02-2017 02:58 PM
    filetype: lsx
    filesize: 5.79MB
    downloads: 19


  • lsm 2018_Chairman's_predictions v2.xlsm

    2018_Chairman's_predictions v2.xlsm

    downloaddownload file

    uploaded: 10-31-2017 11:02 AM
    filetype: lsm
    filesize: 379.14kb
    downloads: 78


  • lsx auto_mobility_data.xlsx

    auto_mobility_data.xlsx

    downloaddownload file

    uploaded: 10-31-2017 11:10 PM
    filetype: lsx
    filesize: 6.1MB
    downloads: 52


  • lsx 2018 start of season Elos.xlsx

    2018 start of season Elos.xlsx

    downloaddownload file

    uploaded: 11-03-2017 12:43 PM
    filetype: lsx
    filesize: 136.58kb
    downloads: 50


  • lsx 2018 start of season Elos v2.xlsx

    2018 start of season Elos v2.xlsx

    downloaddownload file

    uploaded: 11-25-2017 10:51 AM
    filetype: lsx
    filesize: 140.36kb
    downloads: 69


  • lsx most competitive events since 2005.xlsx

    most competitive events since 2005.xlsx

    downloaddownload file

    uploaded: 11-29-2017 12:16 AM
    filetype: lsx
    filesize: 9.64MB
    downloads: 78


  • lsx most competitive divisions since 2005.xlsx

    most competitive divisions since 2005.xlsx

    downloaddownload file

    uploaded: 11-29-2017 08:47 PM
    filetype: lsx
    filesize: 6.69MB
    downloads: 63


  • lsx OPR seed investigator.xlsx

    OPR seed investigator.xlsx

    downloaddownload file

    uploaded: 01-06-2018 05:11 PM
    filetype: lsx
    filesize: 367.39kb
    downloads: 15


  • lsx serpentine_valley.xlsx

    serpentine_valley.xlsx

    downloaddownload file

    uploaded: 01-07-2018 10:11 PM
    filetype: lsx
    filesize: 15.34MB
    downloads: 9



Recent Downloaders

Discussion

view entire thread

Reply

07-16-2017 12:10 AM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I frequently work on small projects that I don't believe merit entire threads on their own, so I have decided to upload them here and make a post about them in an existing thread. I also generally want my whitepapers to have instructions sheets so that anyone can pick them up and understand them. However, I don't want to bother with this for my smaller projects.



07-24-2017 09:12 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

In this post, Citrus Dad asked for a comparison of my Elo and OPR match predictions for the 2017 season. I have attached a file named "Elo and OPR comparison" that does this. Every qual match from 2017 is listed. Elo projections, OPR projections, and the average of the two, are also shown for each match. The square errors for all projections are shown, and these square errors are averaged together to get Brier scores for the three models.

Here are the Brier score summaries of the results.

Code:
Total Brier scores		
OPR	Elo	Average
0.212	0.217	0.209
		
Champs only Brier scores		
OPR	Elo	Average
0.208	0.210	0.204
The OPR and Elo models have similar Brier scores, with OPR taking a slight edge. This is directly in line with results from other years. However, predictions this year were much less predictive than any year since at least 2009. This is likely due to a combination of the non-linear and step-function-esque aspects of scoring for the 2017 game. My primary prediction method last season actually used a raw average of the Elo predictions and the OPR predictions, which provided more predictive power than either method alone.



07-25-2017 06:02 PM

Citrus Dad


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
In this post, Citrus Dad asked for a comparison of my Elo and OPR match predictions for the 2017 season. I have attached a file named "Elo and OPR comparison" that does this. Every qual match from 2017 is listed. Elo projections, OPR projections, and the average of the two, are also shown for each match. The square errors for all projections are shown, and these square errors are averaged together to get Brier scores for the three models.

Here are the Brier score summaries of the results.
Code:
Total Brier scores		
OPR	Elo	Average
0.212	0.217	0.209
		
Champs only Brier scores		
OPR	Elo	Average
0.208	0.210	0.204
The OPR and Elo models have similar Brier scores, with OPR taking a slight edge. This is directly in line with results from other years. However, predictions this year were much less predictive than any year since at least 2009. This is likely due to a combination of the non-linear and step-function-esque aspects of scoring for the 2017 game. My primary prediction method last season actually used a raw average of the Elo predictions and the OPR predictions, which provided more predictive power than either method alone.
Thanks



07-28-2017 08:01 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I am currently working on a model which can be used to predict who will win the Chairman's Award at a regional or district event. I am not covering district championship Chairman's or Championship Chairman's because of their small sample sizes. The primary inputs to this model are the awards data of each team at all of their previous events, although previous season Elo is also taken into account.

The model essentially works by assigning value to every regional/district award a team wins. I call these points milli-Chairman's Awards, or mCA points. I assigned the value of a Chairman's win in the current season at a base event of 50 teams to have a value of 1000 mCA. Thus, all award values can be interpreted as what percentage of a Chairman's award they are worth. Award values and model parameters were the values found to provide the best predictions of 2015-2016 Chairman's wins. At each event, a logistic distribution is used to map a team's total points to their likelihood of winning the Chairman's Award at that event. Rookies, HOF teams, and teams that won Chairman's earlier in the season are assigned a probability of 0%.

I have attached a file named 2017_Chairman's_predictions.xlsm which shows my model's predictions for all 2017 regional and district events, as well as a sheet which shows the key model parameters and a description of each. The model used for these predictions was created by running from the period 2008-2016, with tuning specifically for the period 2015-2016, so the model did not know any of the 2017 results before "predicting" them.

Key takeaways:

  • The mean reversion value of 19% is right in line with the 20% mean reversion value I found when building my Elo model. It intrigues me that two very different endeavors led to essentially equivalent values.
  • It was no surprise to me that EI was worth 80% of a Chairman's Award. I was a bit surprised though to find that Dean's List was worth 60% of a Chairman's Award, especially because two are given out at each event. That means that the crazy teams that manage to win 2 Dean's List Awards at a single event are better off than a team that won Chairman's in terms of future Chairman's performance.
  • I have gained more appreciation for certain awards after seeing how strongly they predict future Chairman's Awards. In particular, the Team Spirit and Imagery awards.


More work to come on this topic in the next few hours/days.



07-29-2017 08:35 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I have added another workbook named "2018 Chairman's Predictions." This workbook can be used to predict Chairman's results for any set of teams you enter. The model used here has the same base system as the "2017 Chairman's Predictions" model, but some of the parameter values have changed. These parameters were found by minimizing the prediction error for the period 2016-2017.

Also in this book is a complete listing of teams and their current mCA values. The top 100 teams are listed below.

Code:
team	mCA
1718	9496
503	9334
1540	9334
2834	9191
1676	8961
1241	8941
68	8814
548	8531
2468	8112
2974	8092
27	8047
1885	7881
1511	7786
1023	7641
1305	7635
2614	7568
245	7530
1629	7381
2486	7100
66	7027
3132	6748
1816	6742
1086	6551
1311	6482
1710	6263
2648	6241
125	6223
558	6155
141	6083
1519	6082
1983	6060
4039	5985
33	5851
2771	5780
1902	5582
624	5578
1011	5496
118	5470
2137	5461
1218	5424
2169	5390
910	5382
3284	5353
3478	5344
771	5321
75	5306
2557	5291
233	5287
987	5224
1868	5215
3309	5175
1714	5158
932	5147
1986	5144
537	5138
597	5077
604	5068
2056	5059
2996	5054
4613	5042
399	5029
1477	5010
2220	4994
2337	4955
3618	4896
4125	4823
217	4816
1730	4803
359	4784
2655	4714
2500	4706
694	4695
1923	4667
708	4662
1622	4661
1987	4655
2642	4655
1671	4630
4013	4627
772	4626
2415	4622
4063	4604
540	4501
433	4440
4525	4426
384	4412
3476	4384
2485	4333
3008	4325
303	4307
1711	4288
2590	4266
3142	4264
3256	4260
836	4251
3880	4250
1678	4244
2471	4237
230	4230
78	4224
If I make an event simulator again next year, I will likely include Chairman's predictions there.



08-04-2017 02:24 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I got a question about historical mCA values for a team, so I decided to post the start of season mCA values for all teams since 2009. This can be found in the attached "Historical_mCA" document.



09-26-2017 04:45 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I was wondering if alliances with surrogates were more or less likely to win than comparable alliances without surrogates. To investigate this, I found all 138 matches since 2008 in which opposing alliances had an unequal number of surrogates. I threw out the 5 matches in which one alliance had 2 surrogates more than the other alliance.

I started by finding the optimal Elo rating to add to the alliance that had more surrogates in order to minimize the Brier score of all 133 matches. This value was 25 Elo points. The Brier score improved by 0.0018 with this change. This means that, in a match between two otherwise even alliances, the alliance with the surrogate team would be expected to win about 53.5% of the time. This potentially implies that it is advantageous to have alliances which contain surrogates.

To see if this was just due to chance, I ran 10 trials where I would randomly either added or subtracted 25 Elo points from each alliance. The mean Brier score improvement with this method was -0.00005, and the standard deviation of Brier score improvement was 0.0028. Assuming the Brier score improvements to be normally distributed, we get a z-score of -0.62, which provides a p-value of 0.54. This is nowhere near significant, so we lack any good evidence that it is either beneficial or detrimental to have a surrogate team on your alliance.

Full data can be found in the "surrogate results" spreadsheet. Bolded teams are surrogates.



09-26-2017 11:26 PM

Bryce2471


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
I frequently work on small projects that I don't believe merit entire threads on their own, so I have decided to upload them here and make a post about them in an existing thread. I also generally want my whitepapers to have instructions sheets so that anyone can pick them up and understand them. However, I don't want to bother with this for my smaller projects.
If you have not read The Signal and the Noise by Nate Silver, (the guy who made FiveThirtyEight) I highly recommend it. I have no affiliation to the book, other than that I read it and liked it. I would recommend it to anyone who is interested in these statistics and prediction related projects.



09-26-2017 11:50 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Bryce2471 View Post
If you have not read The Signal and the Noise by Nate Silver, (the guy who made FiveThirtyEight) I highly recommend it. I have no affiliation to the book, other than that I read it and liked it. I would recommend it to anyone who is interested in these statistics and prediction related projects.
Definitely this.

I actually read that book quite a while back. At the time, I thought it was interesting, but quickly forgot much of it. It was only relatively recently that I realized that the world is full of overconfident predictions, and that humans are laughably prone to confirmation bias. I now have a much stronger appreciation for predictive models, and care very little for explanatory models that have essentially zero predictive power.



10-02-2017 11:29 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I decided to investigate how important breaks between matches were for team performance. If the effect of rest is large enough, I thought I might add it into my Elo model. I was originally going to use the match start times as the basis, but after finding serious problems with this data set, I switched to using scheduled start times.

Essentially, what I did was to give each team on each alliance an Elo penalty which was determined by how much "rest" they have had since their last match. I tried both linear and exponential fits, and found that exponential fits were far better suited to this effort. I also used the scheduled time data to build two different models. In the first, I looked at the difference in scheduled start times for each team between their last scheduled match and the current match. In the second, I sorted matches within each event by start time and gave each match an index corresponding to its placement on this list (e.g. Quals 1 has index 1, Quals 95 has index 95, quarterfinals 1-1 has index 96, quarterfinals 2-2 has index 101, etc...).

The best fits for each of these cases were the following:
Time difference: Elo penalty per team = -250*exp((t_current_match_scheduled_time -
t_previous_match_scheduled_time)/(5 minutes))
Match index difference: Elo penalty per team = -120*exp((current_match_index -
previous_match_index)/(0.9))

Both of these models provide statistically significant improvements to my general Elo model. However, the match index method provides about 7X more of an improvement than the time difference method (Brier score improvement of 0.000173 vs 0.000024). This was surprising to me, since I would have expected the finer resolution of the times to provide better results. My guess as to why the indexing method is superior is due to time differences between quals and playoff matches. I used the same model for both of these cases, and perhaps the differences in start times is not nearly as important as the pressure of playing back-to-back matches in playoffs.

I have attached a table summarizing how large of an effect rest has on matches (using the match index model).


Playing back to back matches clearly has a strong negative impact on teams. This generally only occurs in playoff matches between levels. However, its effect is multiplied by 3 since all three alliance members experience the penalty. A 3-team alliance who just played receives a 80 Elo penalty relative to a 3-team alliance who played 2 matches ago, and a 108 Elo penalty relative to a 3-team alliance who played 3 matches ago. 108 Elo points corresponds to 30 points in 2017, and the alliance that receives this penalty would only be expected to win 35% of matches against an otherwise evenly matched opposing alliance.

The match index method ended up providing enough improvement that I am seriously considering adding it into future iterations of my Elo model. One thing holding me back from using it is because it relies on the relatively new data of scheduled times. At 4 years old, this data isn't nearly as dubious as the actual time data (1.5 years old), but it still has noticeable issues (like scheduling multiple playoff replays at the same time).

You can see the rest penalties for every 2017 match in the "2017 rest penalties" document. The shown penalties are from the exponential fit of the match index model.



10-02-2017 11:37 PM

Basel A


Unread Re: paper: Miscellaneous Statistics Projects

I'm a bit skeptical, because there are some effects of alliance number on amount of rest during playoffs (e.g. #1 alliances that move on in two matches will always have maximal rest, and are typically dominant). Not sure if you can think of a good way to parse that out, though.



10-03-2017 09:52 AM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Basel A View Post
I'm a bit skeptical, because there are some effects of alliance number on amount of rest during playoffs (e.g. #1 alliances that move on in two matches will always have maximal rest, and are typically dominant). Not sure if you can think of a good way to parse that out, though.
I don't quite follow. My rest penalties are an addition onto my standard Elo model, which already accounts for general strength of alliances. 1 seeds were already heavily favored before I added rest penalties because the 1 seed almost always consists of highly Elo-rated teams. In my standard Elo model, the red alliance (often 1 seed, but not always) was expected to win the first finals match 57% of the time on average. With my rest penalties added in, the red alliance is expected to win the first finals match 62% of the time on average.



10-03-2017 10:16 AM

Basel A


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
I don't quite follow. My rest penalties are an addition onto my standard Elo model, which already accounts for general strength of alliances. 1 seeds were already heavily favored before I added rest penalties because the 1 seed almost always consists of highly Elo-rated teams. In my standard Elo model, the red alliance (often 1 seed, but not always) was expected to win the first finals match 57% of the time on average. With my rest penalties added in, the red alliance is expected to win the first finals match 62% of the time on average.
Because the first seed is so often in the SF/finals with maximum rest, you could be quantifying any advantage the first seed has (except how good they are as based on quals) as opposed to just rest. To use a dumb example, if the top alliance is favored by referees, that would show up here.



10-03-2017 10:36 AM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Basel A View Post
Because the first seed is so often in the SF/finals with maximum rest, you could be quantifying any advantage the first seed has (except how good they are as based on quals) as opposed to just rest. To use a dumb example, if the top alliance is favored by referees, that would show up here.
Got it, that is an interesting take. Let me think for a little bit on how/if it is possible to separate alliance seeds from rest.



10-03-2017 10:53 AM

GeeTwo


Unread Re: paper: Miscellaneous Statistics Projects

Another factor beyond what will be recognized from ELO is nonlinear improvements due to good scouting, alliance selection, and strategy. I would expect these to affect playoffs far more than quals.



10-03-2017 01:19 PM

Basel A


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
Got it, that is an interesting take. Let me think for a little bit on how/if it is possible to separate alliance seeds from rest.
A first pass could be to compare cases where Alliance #X would be advantaged by rest versus disadvantaged. Would give you an idea of the relative strength of the rest effect as compared to the various other things. Gus's examples are definitely important effects.



10-04-2017 01:48 PM

microbuns


Unread Re: paper: Miscellaneous Statistics Projects

I love the upsets paper - it's fun to look at these games and see the obviously massive disadvantage the winning side had. I'm looking back at games I had seen/participated in, and remembering the pandemonium those games created on the sidelines and behind the glass. Super cool!



10-26-2017 11:36 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
I decided to investigate how important breaks between matches were for team performance...
I've spent a fair bit of time off and on for the past month looking into this more, and since I have other things I would prefer to work on, I'm going to stop working on this for the forseeable future. I would like to retract all the information in the quoted post. I'm undecided on if I should delete the spreadsheet.

Essentially, my rest penalty model actually decreased my Elo prediction performance for the year 2016 when I applied the same methodology to that year. This probably either means:
  • 2016 and 2017 rest penalties were drastically different
  • My 2017 rest penalties were an overfitting of the data, and do not actually represent any real phenomenon
  • Scheduled time data are unreliable for 2016 and/or 2017
  • There is a bug in my code somewhere I am completely unable to find

If any of the first three are true, I'm not that interested in pursuing rest penalties more, and I have given up looking for bugs for the time being. This also means that I will not be looking at alliance seed affecting playoff performance for now.

When I originally created the rest penalties, I never really applied them to years other than 2017 (for which I was optimizing). This meant that I made the mistake I often criticize others for of not keeping training and testing data separate. I incorrectly believed that my statistical significance test would be sufficient in place of testing against other data, and am still baffled as to how my model could so easily pass a significance test without having predictive power in other years.

So anyway, sorry if I misled anyone, I won't make this same mistake again.



10-31-2017 11:11 AM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Now that we actually have team lists for events, I thought I would revisit my 2018 Chairman's Predictions workbook since it is the most popular download of mine. It turns out that I did not have support for 2018 rookies, resurrected teams, or new veterans in these predictions.

I have attached a new workbook titled "2018_Chairman's_predictions_v2" which provides support for these groups. I also have added an easy way to import team lists for events simply by entering in the event key. If you have additional knowledge of events (or if you want to make a hypothetical event), you can still add teams to the list manually. I have also switched to using the TBA API v3, so this should hopefully still work after Jan 1.

Let me know if you notice any bugs with this book.



10-31-2017 10:25 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

For nearly all statistics that can be obtained from official data, one of our biggest issues is separating out individual team data from data points which actually represent something about the entire alliance. However, there was one statistic last season that was actually granular to the team level, and that data point was auto mobility. Referees were responsible this year for marking mobility points for each team individually, so these data points should have little to no dependence on other teams. Unfortunately, auto mobility was a nearly negligible point source for this game, and combined with the extremely high average mobility rates, made this a generally unimportant characteristic to describe teams. However, I thought it would be interesting to take a deeper look into these data to see if we can learn anything interesting from them.

I have uploaded a workbook titled "auto_mobility_data" which provides a few different ways of understanding mobility points. The first tab of this book contains raw data on mobility for every team in every match of 2017. The second tab contains a breakdown by team, listing each team’s season-long auto mobility rate as well as each team’s first match where they missed mobility (for you to check if you don’t believe your team ever missed auto mobility). Overall, about 25% of teams never missed their mobility points in auto, and another 18% had mobility rates of >95%. The top 10 teams with the most successful mobilities without a single miss are:

Code:
Team	Successful Mobilities
2337	86
195	85
4039	85
27	84
3663	82
2771	73
3683	73
1391	72
1519	71
2084	71
4391	71
As another point of investigation, I wanted to see if these “mobility rates” would provide more predictive power over future performance than the comparable metric I used in my workbooks last year, calculated contribution to auto Mobility Points. I compared each team’s qual mobility rate, total mobility rate (including playoffs), and calculated contribution to auto Mobility Points at their first event to the same metrics at their second event. Strong correlations imply that the metric at the first event could have been used as a good predictor of second event performance. Here are the correlation coefficients:
https://imgur.com/a/XttUk

The total mobility rate at event 1 had the strongest correlation with all three of qual rate, total rate, and calculated contribution at event 2, meaning it would likely be the strongest predictor. However, this is a little bit unfair since the total rate metric is incorporating information unavailable to qual rate or calculated contributions. Qual rate and cc at event 1 have roughly even correlation with qual rate at event 2. Qual rate at event 1 has a much stronger correlation with cc at event 2 than does cc at event 1. Overall, this tells me that, if there is a comparable scoring category to auto Mobility in 2018, I can probably get better results by using the robot specific data rather than using cc on the entire alliance’s score. There might also be potential to combine these metrics somehow, but I have yet to look into this.

My last way to slice the data is by event. I found every event’s total auto mobility rate, as well as a correlation coefficient between each team’s qual auto Mobility Rate and calculated contribution for that event. I was specifically looking to see if I could identify any events which had an unexpectedly low correlation between auto mobility rates and ccs. This might indicate that one or more referees were not associating the correct robots with mobility points (although points for the alliance would be unaffected). Below you can see each event’s mobility rate versus the correlation at the event between mobility rate and cc for each team. I threw out events at which the mobility rate was higher than 90% since events with extremely high auto mobility rates do not provide a reasonable sample size of individual teams doing unique things.
https://imgur.com/a/HNdPb

4 events in this graph stood out to me for having unexpectedly low correlation coefficients. Those events were the Southern Cross Regional, ISR District Event #1, ISR District Event #2, and IN District -Tippecanoe Event. Of these events, only Tippecanoe has a reasonable number of match videos, so I decided to watch the first 10 quals matches at this event. I discovered numerous inconsistencies in the published data with what I could see on the video. Here are the ones I saw:
Quals 1: 2909
Quals 2: 234
Quals 7 (good music this match ): 3147
Quals 10: 3940

My best explanation for these data are that one or more of the referees at this event (and potentially at the other low-correlation events) did not realize that their inputs corresponded to specific teams. Overall, the mobility rate data seem to be better than the calculated contribution data, so I’m not complaining, and I have no desire to call out specific referees, it is just interesting to me that I could track down discrepancies with this methodology.

That’s about it for now. I might adapt some of these efforts soon to looking at touchpad activation rates.



11-03-2017 12:44 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

After trying a couple of different changes to my Elo model, I have found one that has good predictive power, is general enough to apply to all years, and is straightforward to calculate. What I have done is to adjust each team's start of season Elo to be a weighted average of their previous two year's End of season Elos. The previous year's Elo has a weight of 0.7, and the Elo of two years prior has a weight of 0.3. This weighted Elo is then reverted to the mean by 20%, just as in the previous system. In the previous system, only the last season's Elo was taken into consideration. Second year teams have their rookie rating of 1320 (1350 before mean reversion) set as their end of season Elo from two years previous.

This adjustment provides substantial predictive power improvement, particularly at the start of the season. Although it causes larger Elo jumps for some teams between seasons, Elos during the start of the season are generally more stable. As an indirect consequence of this adjustment, I also found the optimal k value for playoff matches to be 4 instead of 5 which it was under the previous system. This means that playoff matches have slightly less of an impact on a team's Elo rating under the new system.

I have attached a file called "2018 start of season Elos" that shows what every team's Elo would have been under my previous system, as well as their Elo under this new system. Sometime before kickoff, I will publish an update to my "FRC Elo" workbook that contains this change as well as any other changes I make before then.



11-03-2017 01:11 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

With this change, Elo actually takes a razor thin edge over standard OPR in terms of predictive power for the 2017 season (season long total Brier score = 0.211 vs 0.212 for OPR). However, it should be noted that this isn't really a fair comparison, since OPRs predictive power could probably be improved with many of these same adjustments I have been making to Elo. Even so, I think it's pretty cool that we now have a metric that provides more predictive power than conventional OPR, which has been the gold standard for at least as long as I have been around in FRC.



11-25-2017 10:58 AM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Not a huge change, but I have uploaded a sheet called "2018 start of season Elos v2" which incorporates all of the changes I have put into my Elo model. Since the original "2018 start of season Elos" sheet already had the 2-season weighted average built into it, Elo ratings are, for the most part, just 70ish points higher in this sheet.



01-06-2018 05:19 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I am investing some of my effort now into improving calculated contribution (OPR) predictions. The first thing I really want to figure out is what the best “seed” OPR is for a team going into an event is. We have many choices for how to calculate this seed value based on past results, so I’d like to narrow my options down before building a formal model. To accomplish this, I investigated the years 2011-2014 to find which choice of seeds correlate the best with teams’ calculated contributions at the championship. I used this point in time because it is the spot in the season where we have the most data on teams before the season is over. The best seeds should have the strongest correlation with the team’s championship OPR, and by using correlations instead of building a model, I can ignore linear offsets in seed values.

When a team has only a single event in a season, my choices of metrics to use to generate seed values are basically restricted to either their OPR at their only event, or their pre-champ world OPR. There is potential for using normalized OPRs from previous seasons as seeds, but I chose not to investigate this since year-year team performance is quite drastic.

When a team has attended 2+ events, I have many more options for metrics that can be used to determine their seed value:
The team’s OPR at their first event of the season
The team’s OPR at their second event of the season
The team’s OPR at their second to last pre-champs event of the season
The team’s OPR at their last pre-champs event of the season
The team’s highest pre-champs OPR of the season
The team’s second highest pre-champs OPR of the season
The team’s lowest pre-champs OPR of the season
The team’s pre-champs world OPR

Many of these metrics will overlap for teams, but they are all distinct metrics.

Using each of these seed options, I found correlation coefficients for these metrics with every championship attending team with each team’s championship OPR. I did this for each year 2011-2014 as well as an average correlation for all four years (I didn’t weight by number of teams since there were ~400 champ teams in each of these years). The results are summarized in this table, and can also be found in the “summary tab” of the “OPR seed investigator.xlsx” spreadsheet. Raw data can be found in the year sheets of the workbook as well.

As can be seen in the table, we roughly have from most correlation to least correlation:
highest > world > last > second >>> second highest > second to last >>> lowest > first
Going into this analysis, I had anticipated that the top three seed metrics would be highest, world, and last, but my expected ordering probably would have been something like last > highest >> world.

I was actually hoping that there would be a clearer difference between these top three metrics so that I could throw out one or two of these options going into my model creation. I had always been pretty skeptical of world OPR, it seemed to me that, although it has a better sample size than conventional single event OPR, that it would perform worse since it incorporates early season matches that may not reflect teams accurately by the time champs rolls around. However, world OPR was better correlated with champs performance than was my previous metric of choice, last event OPR, so my fears with world OPR are probably not very justified.

I also tried combining metrics with a weighted average. The optimal weightings I found and correlation coefficients can also be found in the “summary” tab. For example, when combining first OPR and second OPR, the optimal weighted average would be 0.3*(first OPR) + 0.7*(second OPR) I did not find much that was interesting in this effort. Highest OPR is consistently the best predictor of champs OPR no matter which other metric it is paired with. Some of the optimal weightings are mildly interesting particularly the negative weightings given to poor metrics paired with world OPR.

Moving forward, I will probably have to try to use all three of highest OPR, world OPR, and last OPR when building a predictive model. I will also have to determine the best linear offsets to use for these metrics, and determine if the best seed metrics remain the same throughout the season, since this effort only looked at a single point in the season.



01-07-2018 10:23 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I am interested in predicting teams’ win probabilities for events before events even start. To do this, I need a metric for each team’s ability before the event starts. I decided to use Elo just because it is easily accessible to me and because OPR is not technically defined for a new event before the event starts, although many choices of OPR seeds could be used (see above post) to achieve the same effect.

One of my first questions I would like to answer before predicting event winners is to see if each team’s probability of winning the event strictly increases as their pre-event Elo rating also increases. At a first pass, this seems like it should be clearly true. However, when you think deeper into the structure of how FRC events choose winners, there is one huge exception to this rule, and that is the second pick of a high seeded alliance. These teams are generally agreed to be “worse” teams than low seeded alliance captains and first picks, yet they generally have an easier path to winning the event. Put another way, clearly the highest ranked Elo team going into the event will have the highest probability of winning, and the second highest Elo team will have the second highest probability of winning, but the question is if there exists some “valley” of Elo ranks for which the teams at these ranks are less likely to win the event than some teams at ranks below theirs. A hypothetical distribution of this kind is shown in this image. Here, there is a “serpentine valley” stretching from about rank 10 to rank 16. Note that these ranks are the teams’ start of event Elo rank, not their qualification seeding rank.

To investigate this, I compiled every team’s pre-event Elo rank for all 2008-2017 regional/district events and the winners of these events. Full data can be found in the “serpentine_valley.xlsx” workbook, although I did unfortunately lose much of the data. I apologize for that, if anyone is actually interested I wouldn't mind re-creating it, but I got what I needed from it. The summary graph is shown here. A 2-rank moving average is also shown here, this graph just smooths out the preceding graph a bit for easier interpretation. It is difficult to say definitively that a "serpentine valley" either exists or does not exist based on this data. If it does exist, it is likely centered at about Elo rank 10, and has a width of not more than 3 ranking positions. The top of the hill, if it exists, is probably at rank 11 or 12. For reference on the magnitude of the serpentine valley, the 10th ranked Elo team has won a total of 47 events in my data set of 788 events, and the 11th ranked Elo team has won a total of 69 events.

I also performed a similar analysis using each team’s end of season Elo and each team’s end of quals Elo at the current event, but these were more periphery and did not yield anything noteworthy.

I will have to see moving forward if this effect is large enough to merit inclusion in a pre-event winner prediction model, but my guess at this point is that no such adjustment will be needed. Note that this does not in any way absolve the serpentine model of its known weaknesses. It does however provide reasonably good evidence that teams are only rarely (if ever) incentivized by the current system to start out the event pretending to be worse than they are in order to drop a few ranks in apparent ability.



01-07-2018 10:40 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I'm a little bit confused as to why almost all of my papers have 40+ downloads, but only 5 unique people besides me have commented on this thread.

My theories are:
1. People want more complete whitepapers before commenting or don't like this format
2. My analysis is so rigorous and easily digestible that hardly anyone has questions
3. People are downloading my data and just glancing at it without actually understanding what I am describing
4. My analyses are going over most people's heads and they are afraid to ask questions

I find 1 and 2 unlikely. 3 is perhaps the most likely, and I don't necessarily think that it is a problem. However if 4 is the case I want to strongly encourage anyone to ask me questions or speculate on things I post. Indeed, the one serious challenge I have gotten directly led to me retracting my original analysis, so I do really value feedback.



01-07-2018 11:30 PM

Boltman


Unread Re: paper: Miscellaneous Statistics Projects

I think its just interesting to many of us and if we truly got what all these meant then maybe we would respond more.

I find it fascinating all the statistics, I have yet to find a statistic better than my own eyes though.
OPR is fairly straight forward, some of the others I have no clue what they are like MCa

Maybe a small snippet of whats being compared would be good so a 5yo could understand?
I appricaite what you do and have used your analysis here and there when I can understand it.



01-26-2018 12:34 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Boltman View Post
I think its just interesting to many of us and if we truly got what all these meant then maybe we would respond more.

I find it fascinating all the statistics, I have yet to find a statistic better than my own eyes though.
OPR is fairly straight forward, some of the others I have no clue what they are like MCa

Maybe a small snippet of whats being compared would be good so a 5yo could understand?
I appricaite what you do and have used your analysis here and there when I can understand it.
mCA stands for milli-Chairman's Awards. A team with a rating of 500 mCA has an awards history strength equivalent to winning half of a Chairman's Award in the current season. Higher ratings indicate teams have a stronger award-winning history, and are thus more likely to win the Chairman's Award.

I appreciate the feedback, let me know if I can help make anything clearer.



01-26-2018 01:12 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

With the interesting new dynamic this year of random plate assignments, I decided to look back at previous years to see if team color assignment had any meaningful impact on scores. To do this, I found the optimal Elo "bonus" to give to either the red or the blue alliance in quals matches in order to maximize my Elo model's predictive power. I call this addition the "red Elo advantage." Since Elos can be difficult to interpret, I also included the equivalent point value impact for each year.

Going into this, I expected to see minimal impact of color assignment in any given year since FIRST tries to make the games as symmetric as possible. After reviewing the games, the largest assymetries (relative to the drivers) that I found occurred in 2005 (human loading stations all on one side) and 2017 (gear loading all on one side). There were minor assymetries in 2012 (Kinect stations) and 2015 (unloaded yellow totes). I also expected that, in an average year, the blue alliance would receive a slight bonus due to red being penalized more frequently because red would be perceived as a more aggressive color by the referees.

Here is a table summarizing the results by year. The largest advantage by far comes from the blue alliance in 2005 receiving an Elo advantage of 14 points. This is followed by 9 Elo point advantages for blue in both 2007 and 2008. I'm unsure why 2007 and 2008 have such large advantages, but 2005 was one of the years I had anticipated seeing the biggest differences due to assymetry. Also note that the fewer matches in 2005-2008 relative to later years might be causing a result purely arising from chance. I might run a significance test for each year later, but I don't really care because all effects are so minimal.

From 2009-2017, the year with the largest impact was 2017, with a blue Elo advantage of 6, corresponding to 1.7 match points. This was also expected due to the nature of the arena last year. Other years in this time period look to have nearly negligible Elo impact.

Aggregate Elo advantages can be seen in this table. During 2009-2017 (years with more matches and the modern era of bumper colors), blue on average receives a 1.2 Elo point advantage. This is probably nowhere near statistically significantly different from 0, but it is in line with my prediction about red being penalized more. Unfortunately, we only have good penalty data since 2015, so it won't be feasible for a few more years to see if red is actually penalized more than blue in the average game.

Another thing I realized in this process is that, even if the field is symmetric for the drivers, it will not be symmetric for the referees. The head ref's side of the field will almost certainly receive either more or less penalties depending on the game dynamics (as should happen, otherwise the head ref wouldn't be doing her job). Thus, it shouldn't be surprising to anyone to see red or blue receive at least a slight edge in every year.

Like I predicted, all of these effects are minimal. However, this will give us a good reference point to see if plate color assignment this year has as large or larger of an impact as team color assignment from prior years.



01-26-2018 03:05 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

I ran significance tests for the years that were most likely to have significant advantages for one color over the other. The results are in this table. Of the years that I tested, 2017 was the only one that was significant, none of the others were even very close. Even 2017 should be viewed with caution since I essentially ran 13 significant tests, so there was a 30% chance that at least one of those tests would provide a p-value at least as low as that of 2017 purely by chance.

Basically, the only year for which we have reasonable evidence against the null hypothesis is 2017, and I would still be wary of rejecting the null hypothesis for 2017.



01-26-2018 04:54 PM

Ginger Power


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
Now that we actually have team lists for events, I thought I would revisit my 2018 Chairman's Predictions workbook since it is the most popular download of mine. It turns out that I did not have support for 2018 rookies, resurrected teams, or new veterans in these predictions.

I have attached a new workbook titled "2018_Chairman's_predictions_v2" which provides support for these groups. I also have added an easy way to import team lists for events simply by entering in the event key. If you have additional knowledge of events (or if you want to make a hypothetical event), you can still add teams to the list manually. I have also switched to using the TBA API v3, so this should hopefully still work after Jan 1.

Let me know if you notice any bugs with this book.
I find this spreadsheet to be extremely interesting. I'm wondering what the logic is in not capping the number of years that contribute to mCA?

From the FIRST Inspires Website:

Quote:
The criterion for the Chairman’s Award has special emphasis on recent accomplishments in both the current season, and the preceding two to five years. The judges focus on teams’ activities over a sustained period, as distinguished from just the robot design and build period.
Given that judges are instructed to emphasize the most recent 2-5 years, I would think it would make sense to ignore accomplishments made prior to 2013 when calculating mCA. Obviously you have the 19% regression to 0, but there is still a residual effect from an award that was won in 2009 when realistically that probably doesn't mean much.

I'm of the opinion that keeping the entire body of work for a team is a better representation for their standing as a Hall of Fame contender, while keeping just the most recent 5 years would be a better representation of a team's standing at a local event.

Additionally, I'm curious as to why Rookie All Star isn't factored in for mCA? My understanding is that the Rookie All-Star is essentially the rookie team that best fits the mold of a future Chairman's Award team. I would think that a team that has won RAS is more likely to win CA in the future than a team that didn't win RAS.

Edit:

In terms of event predictions, I'm wondering if it would make sense to have some sort of cutoff after the top X number of teams. Realistically, you won't have 60/60 teams at an event present for Chairman's Award, so it doesn't make sense for the 60th ranked team in terms of mCA to have a .5% chance of winning CA. I don't know what percent of teams at an event typically submit for Chairman's Award... my guess would be 1/3 of teams submit, but that's probably high.



01-26-2018 08:55 PM

Caleb Sykes


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Ginger Power View Post
snip
All good points. I'll go back and test out some of these thoughts with my model. Capping for only the past 5 years especially intrigues me.

Quote:
Additionally, I'm curious as to why Rookie All Star isn't factored in for mCA? My understanding is that the Rookie All-Star is essentially the rookie team that best fits the mold of a future Chairman's Award team. I would think that a team that has won RAS is more likely to win CA in the future than a team that didn't win RAS.
I did build a RAS value into the model, which is why it shows up in the "Model Parameters" tab. However, I found that the optimal value for this award in terms of predictive power was 0 (+-50ish). I was originally surprised by this, and a little bit disappointed to be honest. I'll try optimizing my model again to make sure this was not done in error but I doubt it. All of the weightings I use were those that maximize the predictive power of my model, it has nothing to do with personal preference.

Quote:
In terms of event predictions, I'm wondering if it would make sense to have some sort of cutoff after the top X number of teams. Realistically, you won't have 60/60 teams at an event present for Chairman's Award, so it doesn't make sense for the 60th ranked team in terms of mCA to have a .5% chance of winning CA. I don't know what percent of teams at an event typically submit for Chairman's Award... my guess would be 1/3 of teams submit, but that's probably high.
My concern with this line of thought is that, although only some proportion of teams at an event submit for Chairman's, we don't know which teams those are. Obviously, teams with stronger awards histories are more likely to submit for Chairman's than teams without such histories, but we can never definitively say which teams are and are not presenting. As an example, I ran through the weakest mCA teams to win Chairman's last year, and team 4730 won at PCH Albany despite: having negative mCA, having never won a judged award prior to this, and having the lowest mCA of any team at their event. You can check this using my "2017 Chairman's predictions.xlsm" workbook. Going from 0.5% to 0.1% for example is a deceptively huge jump. We would expect about one 0.5% team to win Chairman's Award each season (since there are around 200 events), but we would only expect to see a 0.1% team win Chairman's in about a 5-year period.

I'll try adding a "weak team" penalty into the model that subtracts some mCA amount from the lowest X% of teams at the event to see if that improves the predictive power at all, but I'm pretty skeptical since the model seemed to be well-calibrated when I built it.



01-27-2018 10:49 AM

Ginger Power


Unread Re: paper: Miscellaneous Statistics Projects

Quote:
Originally Posted by Caleb Sykes View Post
All good points. I'll go back and test out some of these thoughts with my model. Capping for only the past 5 years especially intrigues me.


I did build a RAS value into the model, which is why it shows up in the "Model Parameters" tab. However, I found that the optimal value for this award in terms of predictive power was 0 (+-50ish). I was originally surprised by this, and a little bit disappointed to be honest. I'll try optimizing my model again to make sure this was not done in error but I doubt it. All of the weightings I use were those that maximize the predictive power of my model, it has nothing to do with personal preference.



My concern with this line of thought is that, although only some proportion of teams at an event submit for Chairman's, we don't know which teams those are. Obviously, teams with stronger awards histories are more likely to submit for Chairman's than teams without such histories, but we can never definitively say which teams are and are not presenting. As an example, I ran through the weakest mCA teams to win Chairman's last year, and team 4730 won at PCH Albany despite: having negative mCA, having never won a judged award prior to this, and having the lowest mCA of any team at their event. You can check this using my "2017 Chairman's predictions.xlsm" workbook. Going from 0.5% to 0.1% for example is a deceptively huge jump. We would expect about one 0.5% team to win Chairman's Award each season (since there are around 200 events), but we would only expect to see a 0.1% team win Chairman's in about a 5-year period.

I'll try adding a "weak team" penalty into the model that subtracts some mCA amount from the lowest X% of teams at the event to see if that improves the predictive power at all, but I'm pretty skeptical since the model seemed to be well-calibrated when I built it.
I completely understand that all of your decisions were based off of predictive power. All of my suggestions were based on my impressions about the Chairman's Award, and what I know about teams that have won it. Obviously not too scientific on my end

I'm looking forward to future postings on the subject!



02-19-2018 09:02 PM

Jacob Plicque


Unread Scouting Statistics

Caleb
I found your scouting system to be a great source of strategy and scoring trends in 2017. I hope you are producing a new one for 2018.



02-19-2018 09:33 PM

Caleb Sykes


Unread Re: Scouting Statistics

Quote:
Originally Posted by Jacob Plicque View Post
Caleb
I found your scouting system to be a great source of strategy and scoring trends in 2017. I hope you are producing a new one for 2018.
Glad to hear it. I'm always happy to hear that people find my work useful.

I'm working on a 2018 scouting database and event simulator right now. They'll definitely be out before week 1 competitions start, but I can't promise a specific date, hopefully no later than next Monday.



view entire thread

Reply

Tags

loading ...



All times are GMT -5. The time now is 07:07 PM.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi