Go to Post "I'm not going to tell you all that you all are winners. At this point you are smart enough to know whether you are or you aren't." -Woodie Flowers - Barry Bonzack [more]
Home
Go Back   Chief Delphi > ChiefDelphi.com Website > Extra Discussion
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
Thread Tools Rate Thread Display Modes
  #1   Spotlight this post!  
Unread 05-11-2018, 07:11 PM
Caleb Sykes's Avatar
Caleb Sykes Caleb Sykes is offline
Knock-off Dr. Strange
AKA: inkling16
no team
 
Join Date: Feb 2011
Rookie Year: 2009
Location: Minneapolis, Minnesota
Posts: 1,655
Caleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond repute
paper: Miscellaneous Statistics Projects 2018

Thread created automatically to discuss a document in CD-Media.

Miscellaneous Statistics Projects 2018 by Caleb Sykes

This whitepaper is a continuation of my miscellaneous statistics projects whitepapers from last year. For those not familiar, here is a summary of why I do this:
I frequently work on small projects that I don't believe merit entire threads on their own, so I have decided to upload them here and make a post about them in an existing thread. I also generally want my whitepapers to have instructions sheets so that anyone can pick them up and understand them. However, I don't want to bother with this for my smaller projects.

I have decided to make a new thread this year in order to not overload my other thread with too many whitepapers, and because I will be analyzing 2018 specific things here. As always, feel free to provide feedback of any kind, including pointing out flaws in my data or my analysis.
Reply With Quote
  #2   Spotlight this post!  
Unread 05-11-2018, 07:24 PM
Caleb Sykes's Avatar
Caleb Sykes Caleb Sykes is offline
Knock-off Dr. Strange
AKA: inkling16
no team
 
Join Date: Feb 2011
Rookie Year: 2009
Location: Minneapolis, Minnesota
Posts: 1,655
Caleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond repute
Re: paper: Miscellaneous Statistics Projects 2018

My first book for this year is an investigation of what my Elo model might look like if I tried to incorporate non-WLT RPs. This idea was spawned by posts 41-44 in this thread. This workbook has identical data as my normal FRC Elo book for 2005-2015, but from 2016-2018, I make adjustments to incorporate the other ranking points. I was unable to find a nice data set to use for 2012 coop RPs, if someone knows of one, let me know and I might try to do this same analysis for that year. For each year in 2016-2018, there were two additional non-WLT ranking points available in each quals match. In 2016 and 2017, the tasks required to achieve these ranking points were also worth bonus points in playoff matches.

The concern that spawned this effort is that, in quals matches, many/most teams are not strictly trying to win, but rather are trying to maximize the number of ranking points they earn. Without some kind of RP correction, this means that teams who are good at earning these RPs might be under-rated by Elo, since they might be more likely to win matches if they werenít expending effort on the RPs. Additionally, since playoffs had different scoring structures than quals in 2016-2017, the teams that do well earning these RPs in quals will presumably be even more competitive in playoffs due to the bonuses.

My approach for this effort was to find the optimal value to assign to the qualification RPs, and to add this value to teamsí winning margins for the quals matches in which they achieve this RP. I wanted to find the optimal value for each of the six types of RPs between 2016-2018. Although there are other approaches to incorporating RP strength into an all-encompassing team rating, I always prefer to use methods which can be used to maximize predictive power over methods that donít, since I can justify why I chose the values I did over just taking guesses about how much different things are worth. There are a few different metrics I could have chosen to optimize, but I settled on overall playoff predictive power over the full period 2016-2018. I chose to optimize for playoff performance since in playoff matches teams are almost strictly just trying to maximize their winning margin (or win). This contrasts with quals matches where teams may have other considerations for the match, potentially including going for RPs or showing off so they are more likely to be selected. I also chose to maximize predictive power over the full 2016-2018 instead of each year separately since Elo ratings carry over some between years, so the optimal value for 2016 RPs when maximizing predictive power for 2016 alone will be a bit different than the optimal value when maximizing over all three years since the latter will look at how well the rating carries over between years.

Here were the optimal (+-20% or 1 point, whichever is greater) values I found for each of the 6 RPs, measured in units of their respective yearís points:
2016 Teleop Defenses Breached: 2
2016 Teleop Tower Captured: 8
2017 kPa Ranking Point Achieved: 80
2017 Rotor Ranking Point Achieved: 40
2018 Auto Quest Ranking Point: 7
2018 Face The Boss Ranking Point: 45
All of these values are positive, which indicates that on average teams that get these RPs in quals are more likely to do better in playoffs than similar teams who do not. You can see the effects of these adjustments by looking at the attached book and looking at the ďAdjusted Red winning marginĒ column. This value should be equal to the red score minus the blue score with additional additions/subtractions depending on the RPs both alliances received. For example, in 2018 Great Northern qm 31, blue wins 305 to 288, so redís unadjusted winning margin is -17. Red got the auto RP and blue got the climb RP in this match though, so after accounting for these, redís adjusted winning margin is -17+7-45=-55.

Here are my probably BS rationalizations of why these RPs have the values above:
2016 Teleop Defenses Breached: It really doesnít surprise me that this value is so low. Teams tended to deal with the defenses in quals in much the same way they dealt with them during playoffs. Although there was a 20 point bonus in playoffs for the breach, any alliance worth their salt was going to get this anyway, so a team that got this RP consistently in quals wasnít set up to do that much better in playoffs than a similar team who got this RP less consistently.
2016 Teleop Tower Captured: I donít want to analyze this RP too much since its definition changed for championships, an event where teams were getting this RP much more frequently than a standard regional/district. I wouldnít have expected this value to exceed 10, since it generally took at least a pair of competent scorers to get 8 or 10 balls, and the 20 point playoff bonus divided by 2 is 10. I donít think teams would have played much differently in quals if this RP had not existed, except maybe being more conservative in the last few 30 seconds to make sure everyone surrounded the tower.
2017 kPa Ranking Point Achieved: This is by far the RP that had the most value. There are a couple of reasons I think it is so high. To start, there was a 20 point playoff bonus for this task that was unavailable in quals, and unlike the teleop tower captured in 2016, getting this RP was generally an individual effort, so a team that gets this RP consistently in quals should be worth at least 20 points more in playoffs than a similar team that does not. On top of this, because there were so few ways to score additional points in playoffs, the 40-70 fuel points scored in playoffs are in a sense more valuable than the points scored with other methods. There were diminishing returns on gear scoring after getting the third rotor, and no value at all in scoring gears after the fourth rotor, and thereís not much teams could do to get more climbing points except potentially lining up a bit earlier to avoid mistakes. Fuel points though were unbounded, so a team that consistently got the kPa RP in quals was going to be so much better off in playoffs just because they could get 60-90 points that were unachievable for a non-fuel opposing alliance.
2017 Rotor Ranking Point Achieved: Similar to 2016 teleop tower captured, I think most of the value of this RP comes from the playoff bonus of 100 points. This task required at least two competent robots to perform, which means I would have expected the value of this RP to be bounded above by 50. I donít think the strategy changed much in playoffs due to this RP, since the goal of 40 points + RP in quals is comparatively lucrative to 140 points in playoffs.
2018 Auto Quest Ranking Point: I expected this RP to be worth around 5 points and I was correct. Teams likely opted for higher risk and higher average reward autonomous modes in playoffs than they did in quals because they could afford to have one robot miss out on the crossing or be okay with not getting the switch if they could get one more cube on the scale. This wasnít a huge effect but it does exist.
2018 Face The Boss Ranking Point: I expected the value of this RP to be around 20 points because there is no playoff bonus for this task and I didnít think the opportunity cost was particularly high, although certainly higher than the auto RP. This was the value that most surprised me at 45 points. In my original analysis, I was thinking of the opportunity cost of going for the climb RP, not the extra value of a team implied by said team achieving the climb RP. I think the distinction is important because relatively few teams were able to consistently achieve the climb RP, and the teams that did so were generally very competitive teams. This means that in the playoffs they can afford to spend a few more seconds scoring elsewhere in the field before going for the climb, and can climb much faster on average than teams that were not consistently getting the climb RP in quals. If I had thought about it more from this perspective, I might have predicted this RP to be worth around 30 points instead of 20. The remaining 15 still surprises me though, one possible explanation is that this value is over-rated since we havenít had the 2019 season yet, so the model doesnít properly account for teamsí future success.


Overall, this was an interesting analysis, but I will almost certainly not be incorporating a change like this into my Elo ratings moving forward for the following reasons:
The adjustments made here do not provide enough predictive power for me to consider them worthwhile. These adjustments improved the Brier score for playoff matches in 2016-2018 by about 0.001. I would have needed it to be at least 0.003 to consider it worthwhile, since I am reasonably sure there exist other improvements to my model which can provide this much or more improvement.
We have no guarantee that future games will have similar RP incentives. I try hard to keep the number of assumptions in my model to a minimum. I do this because I want my model to be valuable even when we get thrown a curveball for some aspect of the game like we did this year for time-dependent scoring. Assuming we will continue getting games with this RP structure is just not a very good assumption in my opinion.
There isnít a clear way to find good values to use for the RPs during the season in some years. I am back-fitting data right now so I have a good sample size of quals matches where teams get the RPs. However, if we get a game like 2017 again, where we didnít get above a 2% success rate for either RP until week 4, there just wouldnít be a good sample size of matches to use to find good values until late in the season.
Reply With Quote
  #3   Spotlight this post!  
Unread 05-14-2018, 07:31 PM
Caleb Sykes's Avatar
Caleb Sykes Caleb Sykes is offline
Knock-off Dr. Strange
AKA: inkling16
no team
 
Join Date: Feb 2011
Rookie Year: 2009
Location: Minneapolis, Minnesota
Posts: 1,655
Caleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond repute
Re: paper: Miscellaneous Statistics Projects 2018

Continuing with my investigation I did last year of autonomous mobility. I thought it would be interesting to look at auto mobility rates for every year since 2016. I have attached a book titled "2016-2018_successful_auto_movement" which provides a summary of this investigation. For each team that competed in 2018, it shows their matches played, successful mobilities, and success rates for each year 2016-2018. It also contains these metrics in aggregate over all of these years as well as a reference to the first match in which the team missed auto mobility. I counted both "Crossed" and "Reached" in 2016 as successful mobilities.

Note that this is using data provided by the TBA api, which pulls directly from FIRST. So there are certainly some matches where teams are incorrectly assigned auto mobility or not. There are many possible reasons for this, but one of the ones I identified last year was that referees at some events were entering mobilities based on team positions and not team numbers.

Here's a fun graph of 2017 auto mobility rates versus 2018 auto mobility rates:


Here are the teams that have competed each year 2016-2018, and have never missed auto mobility points according to this dataset:
Code:
team	matches
1506	149
5554	112
4550	86
4050	77
5031	69
3061	61
6175	59
6026	55
1178	48
3293	45
4462	39
6054	36
6167	35
5119	35
6155	35
5508	34
884	32
3511	32
4630	30
4728	29
6164	28
1230	28
2264	28
4054	27
4648	26
5171	26
The only other per-team metric available is climb rates, so I might do an investigation of that in the future. Climbing was wildly different each year though, whereas just moving forward a few feet is essentially the same each year, so it wouldn't be as easy of a comparison.
Reply With Quote
  #4   Spotlight this post!  
Unread 06-08-2018, 07:39 PM
Caleb Sykes's Avatar
Caleb Sykes Caleb Sykes is offline
Knock-off Dr. Strange
AKA: inkling16
no team
 
Join Date: Feb 2011
Rookie Year: 2009
Location: Minneapolis, Minnesota
Posts: 1,655
Caleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond repute
Re: paper: Miscellaneous Statistics Projects 2018

I'm looking to make predicted schedules soon to use for a couple of projects. I would like the capability to do this even before the total number of matches at the event is known. With this in mind, I have attached a simple book which looks at, for every 2018 event, the number of teams at the event versus the number of qual matches/team.

Here is a plot for all events:


And here is a plot for regional events:


I'll likely just set district events (including district champs) to 12 matches/team, champs events to 10 matches/team (although I may change this depending on the structure of champs in future years and the game), and regional events matches according to the formula matches/team = 17.0-(0.13*teams).
Reply With Quote
  #5   Spotlight this post!  
Unread 07-15-2018, 04:01 PM
Caleb Sykes's Avatar
Caleb Sykes Caleb Sykes is offline
Knock-off Dr. Strange
AKA: inkling16
no team
 
Join Date: Feb 2011
Rookie Year: 2009
Location: Minneapolis, Minnesota
Posts: 1,655
Caleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond repute
Re: paper: Miscellaneous Statistics Projects 2018

I just uploaded a workbook called "2018_schedule_strengths_v1".

I'm planning to make a new thread soon to discuss "strength of schedule" in FRC, so I made this book to hopefully inform that discussion. I labeled it v1 because I imagine I'll need to go back and calculate other metrics as the discussion in my upcoming thread progresses.

Essentially, all I did was run my event simulator at each event twice, once before the schedule was released and once after. By looking at each team's ranking distribution change in this time period, we can pinpoint what effect exactly the schedule had. At least that's the idea anyway. I have some summary statistics of each team's ranking predictions for both of these periods, as well as the changes between them included.

Additionally, I have what is my first pass at a "strength of schedule" metric. I calculate this by finding the probability that the given team will seed better with the actual schedule than they would have with a random schedule. So a "schedule strength" of 0% means that you will never seed higher with the existing schedule than you would have with a random schedule, and a "schedule strength" of 100% means that you are guaranteed to seed higher with the actual schedule than you would have with a random schedule.

What I like about this metric:
It compares the given schedule against other hypothetical schedules
It is customized for each team, that is, it compares your hypothetical results with a random schedule with your hypothetical results with the given schedule. I'm not the biggest fan of team-independent metrics since, for example, a schedule full of buddy climb capable partners is amazing for a team without a buddy climber, but just alright for a team that has a good buddy climber, and team-independent metrics would have to give the schedule a single score for both of these teams.
It's on an interpretable scale (0% to 100%) and has meaningful significance
It's able to be calculated before the event occurs (I don't like metrics that require hindsight unless maybe we want to use SoS as a tiebreaker for something)

What I don't like about this metric:
Requires a full event simulator to calculate
Teams that are basically guaranteed to seed first (like 1678 at their later regionals) will inevitably be shown to have bad schedules, since there is no schedule that would give them much of a better chance of seeding higher than their expectation (1st). Switching to greater than or equal ranks just flips the problem to high scores instead of low scores for these scenarios
Average value is 48.1% instead of 50%


Anyway, feel free to use this as proof of how bad your schedule was. The worst schedule this year according to my metric were (excluding the expected 1 seeds):
2096 on Hopper
4065 at Orlando
6459 on Roebling

And the best schedules were:
2220 on Archimedes
5104 on Newton
1806 on Turing

Last edited by Caleb Sykes : 07-15-2018 at 04:05 PM.
Reply With Quote
  #6   Spotlight this post!  
Unread 07-15-2018, 06:38 PM
AriMB's Avatar
AriMB AriMB is online now
The Philadelphian emigrant
AKA: Ari Meles-Braverman
FRC #5987 (Galaxia)
Team Role: Mentor
 
Join Date: Mar 2015
Rookie Year: 2012
Location: Haifa, Israel
Posts: 1,556
AriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond reputeAriMB has a reputation beyond repute
Re: paper: Miscellaneous Statistics Projects 2018

Quote:
Originally Posted by Caleb Sykes View Post
I just uploaded a workbook called "2018_schedule_strengths_v1".

I'm planning to make a new thread soon to discuss "strength of schedule" in FRC, so I made this book to hopefully inform that discussion. I labeled it v1 because I imagine I'll need to go back and calculate other metrics as the discussion in my upcoming thread progresses.

Essentially, all I did was run my event simulator at each event twice, once before the schedule was released and once after. By looking at each team's ranking distribution change in this time period, we can pinpoint what effect exactly the schedule had. At least that's the idea anyway. I have some summary statistics of each team's ranking predictions for both of these periods, as well as the changes between them included.

Additionally, I have what is my first pass at a "strength of schedule" metric. I calculate this by finding the probability that the given team will seed better with the actual schedule than they would have with a random schedule. So a "schedule strength" of 0% means that you will never seed higher with the existing schedule than you would have with a random schedule, and a "schedule strength" of 100% means that you are guaranteed to seed higher with the actual schedule than you would have with a random schedule.

What I like about this metric:
It compares the given schedule against other hypothetical schedules
It is customized for each team, that is, it compares your hypothetical results with a random schedule with your hypothetical results with the given schedule. I'm not the biggest fan of team-independent metrics since, for example, a schedule full of buddy climb capable partners is amazing for a team without a buddy climber, but just alright for a team that has a good buddy climber, and team-independent metrics would have to give the schedule a single score for both of these teams.
It's on an interpretable scale (0% to 100%) and has meaningful significance
It's able to be calculated before the event occurs (I don't like metrics that require hindsight unless maybe we want to use SoS as a tiebreaker for something)

What I don't like about this metric:
Requires a full event simulator to calculate
Teams that are basically guaranteed to seed first (like 1678 at their later regionals) will inevitably be shown to have bad schedules, since there is no schedule that would give them much of a better chance of seeding higher than their expectation (1st). Switching to greater than or equal ranks just flips the problem to high scores instead of low scores for these scenarios
Average value is 48.1% instead of 50%


Anyway, feel free to use this as proof of how bad your schedule was. The worst schedule this year according to my metric were (excluding the expected 1 seeds):
2096 on Hopper
4065 at Orlando
6459 on Roebling

And the best schedules were:
2220 on Archimedes
5104 on Newton
1806 on Turing
How do you calculate the schedule strength metric? I'm assuming it's at least partially based on the changes due to schedule numbers, but I don't see any direct correlation (team A having a larger positive change in average rank than team B does not seem to imply that team A's schedule strength will be lower than team B's, or vice versa).
__________________
Studying MechE at the Technion - Israel Institute of Technology
2017-present: FRC 5987 Technical Mentor
2017-present: FIRST Israel CSA/FTAA
2012-2016: FRC 423 Member Captian, Programmer (LabVIEW), Electrical, CAD, Manipulator, Chassis, Business, Outreach (everything)
Reply With Quote
  #7   Spotlight this post!  
Unread 07-15-2018, 10:24 PM
Caleb Sykes's Avatar
Caleb Sykes Caleb Sykes is offline
Knock-off Dr. Strange
AKA: inkling16
no team
 
Join Date: Feb 2011
Rookie Year: 2009
Location: Minneapolis, Minnesota
Posts: 1,655
Caleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond reputeCaleb Sykes has a reputation beyond repute
Re: paper: Miscellaneous Statistics Projects 2018

Quote:
Originally Posted by AriMB View Post
How do you calculate the schedule strength metric? I'm assuming it's at least partially based on the changes due to schedule numbers, but I don't see any direct correlation (team A having a larger positive change in average rank than team B does not seem to imply that team A's schedule strength will be lower than team B's, or vice versa).
The schedule strength is the probability that the given team will seed higher with the actual schedule than they would have with a random schedule according to the simulator. This is found by the following formula:



Where r and q are ranks and are summed over all ranks and all ranks greater than r respectively. Changing the second summation to over all q>=r would provide a very similar metric, just that it would err on the high side instead of the low side.

For example, say that before the schedule is released a team is predicted to have a 20% chance of seeding first, 30% second, and 50% third. After the schedule is released, they have a 5% chance of seeding first, 25% second, and 70% third. Their schedule strength would then be:
0.05*(0.3+0.5)+0.25*(0.5)=0.04+0.125=0.165 or 16.5%.



Looks like a pretty strong correlation to me. Excepting the teams which are heavy favorites to seed first it seems to be doing it's job.
Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 12:19 PM.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi