Go to Post I'm absolutely certain there are simpler ways of accomplish the same things. I'm simply not the best that FIRST has to offer in this regard and part of what drives me to keep going is the desire to get better at achieving an elegant solution with as few superfluous processes as possible. - Madison [more]
Home
Go Back   Chief Delphi > ChiefDelphi.com Website > Extra Discussion
CD-Media   CD-Spy  
portal register members calendar search Today's Posts Mark Forums Read FAQ rules

 
Reply
 
Thread Tools Rating: Thread Rating: 10 votes, 5.00 average. Display Modes
  #1   Spotlight this post!  
Unread 08-10-2014, 20:02
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
The file should be a tab delimited file of red1 red2 red3 blue1 blue2 blue3 redscore bluescore
Would you please post a ZIP of your input data file?


Reply With Quote
  #2   Spotlight this post!  
Unread 08-10-2014, 20:16
Michael Hill's Avatar
Michael Hill Michael Hill is offline
Registered User
FRC #3138 (Innovators Robotics)
Team Role: Mentor
 
Join Date: Jul 2004
Rookie Year: 2003
Location: Dayton, OH
Posts: 1,567
Michael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Ether View Post
Would you please post a ZIP of your input data file?


It's the same one I PMed you a couple weeks ago, but sure.

https://dl.dropboxusercontent.com/u/5193107/scores.zip

You'll need to install the trueskill package ('pip install trueskill') to use it.
Reply With Quote
  #3   Spotlight this post!  
Unread 11-10-2014, 09:31
Michael Hill's Avatar
Michael Hill Michael Hill is offline
Registered User
FRC #3138 (Innovators Robotics)
Team Role: Mentor
 
Join Date: Jul 2004
Rookie Year: 2003
Location: Dayton, OH
Posts: 1,567
Michael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

So I decided to take this data a little bit further. What I did was to take all of these calculations and run them through the data this year and try to predict matches. For this, I also included a modified Elo system that has diminishing returns for large margins of victory (Calling this Elo Mod). I got some rather surprising results.

My baseline was just using OPR for predicting match outcomes, it was able to predict about 77.1% of the matches this year. This was calculated by adding up the OPRs of each alliance and comparing with the result of the match. TrueSkill was able to predict 79.0% of the matches, a pretty good improvement. I need to develop the prediction model a bit better because it currently doesn't take into account the standard deviation as a measure of certainty. The modified Elo system was able to predict 79.5% of matches, an improvement over TrueSkill. The baseline, unadulterated Elo system as used in this thread was able to predict a whopping 81.4% of matches, by far the best out of any of these models. There is still room for improvement with the TrueSkill and Modified Elo. With the modified Elo, there are some constants that can be tuned for better results. But overall, the results are somewhat interesting. It seems that no matter the ranking model used, about 1 in 5 qualification matches will result in an upset.

Here is the spreadsheet I used: https://dl.dropboxusercontent.com/u/...Trueskill.xlsx
Reply With Quote
  #4   Spotlight this post!  
Unread 11-10-2014, 10:55
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
What I did was to take all of these calculations and run them through the data this year and try to predict matches.
Since you used the word "predict", is it safe to say that what you did was to use the data from weeks 1 through n-1 to predict the matches in week n?


Reply With Quote
  #5   Spotlight this post!  
Unread 11-10-2014, 12:25
Michael Hill's Avatar
Michael Hill Michael Hill is offline
Registered User
FRC #3138 (Innovators Robotics)
Team Role: Mentor
 
Join Date: Jul 2004
Rookie Year: 2003
Location: Dayton, OH
Posts: 1,567
Michael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Ether View Post
Since you used the word "predict", is it safe to say that what you did was to use the data from weeks 1 through n-1 to predict the matches in week n?


No, I guess I should have used a better word than "predict". More like "postdict". I went into it "knowing" an Elo/TrueSkill rating and tested against the data that was used to calculate it. Of course we won't have all the data to calculate the final Elo ratings during the season. My next step is to calculate all the ratings as if it is just before championships and then see how each does with "predicting" championship matches. My SWAG is that the success rate in predictions using the "postdicted" matches is a ceiling for how good we can hope for the predictions to be.
Reply With Quote
  #6   Spotlight this post!  
Unread 11-10-2014, 13:20
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
My baseline was just using OPR for predicting match outcomes, it was able to predict about 77.1% of the matches this year. This was calculated by adding up the OPRs of each alliance and comparing with the result of the match. TrueSkill was able to predict 79.0% of the matches, a pretty good improvement. I need to develop the prediction model a bit better because it currently doesn't take into account the standard deviation as a measure of certainty. The modified Elo system was able to predict 79.5% of matches, an improvement over TrueSkill. The baseline, unadulterated Elo system as used in this thread was able to predict a whopping 81.4% of matches, by far the best out of any of these models.
Try calculating "OPR" using min L1 norm of residuals (LAD) instead of min L2 norm (least squares), and see how that compares.


Reply With Quote
  #7   Spotlight this post!  
Unread 12-10-2014, 09:52
Michael Hill's Avatar
Michael Hill Michael Hill is offline
Registered User
FRC #3138 (Innovators Robotics)
Team Role: Mentor
 
Join Date: Jul 2004
Rookie Year: 2003
Location: Dayton, OH
Posts: 1,567
Michael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

I've tried the L1 optimization problem, but l1-magic is giving me fits. For some reason, it blows up after 10 iterations.

However, I have done the analysis that I wanted. I calculated all the rating systems using data from events prior to CMP, then I used that data to predict CMP matches.

In summary:
OPR: 72.46% Correct
TrueSkill: 69.31% Correct
Elo: 72.90% Correct
Elo Mod: 71.71% Correct

TL;DR: We're okay, but not great at predicting matches. OPR is okay at it, but Elo is better.

I'm still somewhat surprised that Elo is slightly better.

Updated Spreadsheet: https://dl.dropboxusercontent.com/u/...skill%202.xlsx
Reply With Quote
  #8   Spotlight this post!  
Unread 13-10-2014, 00:36
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
I've tried the L1 optimization problem, but l1-magic is giving me fits. For some reason, it blows up after 10 iterations.
What solver are you using ?


Reply With Quote
  #9   Spotlight this post!  
Unread 13-10-2014, 01:06
Michael Hill's Avatar
Michael Hill Michael Hill is offline
Registered User
FRC #3138 (Innovators Robotics)
Team Role: Mentor
 
Join Date: Jul 2004
Rookie Year: 2003
Location: Dayton, OH
Posts: 1,567
Michael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Ether View Post
What solver are you using ?


I'm using the l1eq_pd.m function in the L1-Magic library (http://users.ece.gatech.edu/~justin/l1magic/)

Quote:
Originally Posted by Ether View Post
Try CCWM.


CCWM: 71.41%
Reply With Quote
  #10   Spotlight this post!  
Unread 13-10-2014, 08:58
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
I'm using the l1eq_pd.m function
That's the wrong solver.

Code:

% l1eq_pd.m
%
% Solve
% min_x ||x||_1 s.t.  Ax = b 
Firstly, you cannot find a min L1 norm vector x such that Ax=b because there is no vector x such that Ax=b, since the system is overdetermined.

Secondly, what you want to find is the min L1 norm of the residuals, not of the solution vector itself.

For the set of overdetermined linear equations Ax ≈ b, x is the solution vector. The residuals are b-Ax. So you want find a solution vector x which minimizes the L1 norm of b-Ax.


Reply With Quote
  #11   Spotlight this post!  
Unread 13-10-2014, 15:35
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis


Attached is a comparison of b-Ax residuals for L2 and L1 OPR.

Alliance scores computed from L1 OPR are within +/-10 points of the actual scores 33.5% of the time.

Alliance scores computed from L2 OPR are within +/-10 points of the actual scores only 22.4% of the time.

It is on that basis that I postulate that L1 OPR might be a better predictor of match outcome.

[EDIT]Cannot add attachments to threads associated with papers. Brandon: can you please change this setting to allow attachments? Thank you.[/EDIT]


Reply With Quote
  #12   Spotlight this post!  
Unread 14-10-2014, 06:46
Michael Hill's Avatar
Michael Hill Michael Hill is offline
Registered User
FRC #3138 (Innovators Robotics)
Team Role: Mentor
 
Join Date: Jul 2004
Rookie Year: 2003
Location: Dayton, OH
Posts: 1,567
Michael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond reputeMichael Hill has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Ether View Post

Attached is a comparison of b-Ax residuals for L2 and L1 OPR.

Alliance scores computed from L1 OPR are within +/-10 points of the actual scores 33.5% of the time.

Alliance scores computed from L2 OPR are within +/-10 points of the actual scores only 22.4% of the time.

It is on that basis that I postulate that L1 OPR might be a better predictor of match outcome.

[EDIT]Cannot add attachments to threads associated with papers. Brandon: can you please change this setting to allow attachments? Thank you.[/EDIT]


Ya, I'll admit I don't know what I'm doing with the L1 stuff. It's a new concept to me, so I just did some quick googling and thought I found a quick solution.
Reply With Quote
  #13   Spotlight this post!  
Unread 14-10-2014, 18:34
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
...the L1 stuff [is] a new concept to me...
You have an overdetermined linear system

Ax ≈ b,

where A is the (binary) design matrix of alliances, b is a column vector of alliance scores, and x is what you are trying to find: a column vector of team "OPR" scores.

There is no exact solution for x, since the system is overdetermined. So the idea is to find the "best" solution (in some sense of the word "best").

Notice that the left-hand side (Ax) is a column vector of alliance scores computed from whatever solution x you come up with.

The residuals are b-Ax: a column vector of the differences between the actual alliance scores (b) and the computed alliance scores (Ax).

Looking at it that way, it becomes clear that what you are trying to do is find a solution x which minimizes the residuals (in some sense of the word "minimize").

The most common way to do this is to find x which minimizes the L2 norm of the residuals. The L2 norm of a vector is the square root of the sum of the squares of the vector's elements. The L2 norm solution is also known as the "least squares" solution (for obvious reasons).

It turns out that finding the x which minimizes the L2 norm of b-Ax is computationally straightforward.

In Octave, it's one line of code: x = A\b. The backslash in this context is known as "left division". The syntax is simple, but under the hood there's a lot going on.

For the Ax ≈ b overdetermined linear systems were are dealing with in FRC to compute OPR scores, it turns out that there is a computationally faster way to compute the least squares solution for x. Here's how:
Multiply both sides of Ax ≈ b by the transpose of A to get A'Ax = A'b, or Nx =d where N=A'A and d = A'b.

Nx =d is known as the system of "Normal Equations", and its solution x = N\d gives the same answer as A\b (within rounding error) and is faster to compute.
But "least squares" (min L2 norm of residuals) is not the only possible "best fit" solution to the overdetermined system Ax ≈ b.

For example, there's the "Least Absolute Deviations (LAD)" solution (min L1 norm of residuals). The L1 norm of a vector is the sum of the absolute values of the vector's elements.

Finding an LAD solution for Ax ≈ b is more computationally intensive than least squares.

Perhaps the best way to proceed is to convert the problem to a "Linear Program" (LP) and then use one of the many LP solvers.

For example, here's the AMPL code I used to compute the LAD OPR for your data:

Code:
param m;
param n;

set I := {1..m};
set J := {1..n};

param A{I,J};
param b{I};

var x{J};
var t{I} >= 0;

minimize sum_dev:
	sum {i in I} t[i];

subject to lower_bound {i in I}:
	-t[i] <= b[i] - sum {j in J} A[i,j]*x[j];

subject to upper_bound {i in I}:
		b[i] - sum {j in J} A[i,j]*x[j] <= t[i];


Last edited by Ether : 14-10-2014 at 20:14. Reason: corrected a few typos
Reply With Quote
  #14   Spotlight this post!  
Unread 13-10-2014, 00:38
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
In summary:
OPR: 72.46% Correct
TrueSkill: 69.31% Correct
Elo: 72.90% Correct
Elo Mod: 71.71% Correct
Try CCWM.


Reply With Quote
  #15   Spotlight this post!  
Unread 13-10-2014, 20:21
Ether's Avatar
Ether Ether is offline
systems engineer (retired)
no team
 
Join Date: Nov 2009
Rookie Year: 1969
Location: US
Posts: 8,010
Ether has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond reputeEther has a reputation beyond repute
Re: paper: Weeks 1-2 Elo Analysis

Quote:
Originally Posted by Michael Hill View Post
My baseline was just using OPR for predicting match outcomes, it was able to predict about 77.1% of the matches this year.
I just ran the OPR numbers using the data you linked in your earlier post. I came up with 6919 of 8921 matches correctly "postdicted", or 77.56%

Our numbers are very close, but I had expected them to be identical.

Here's a link to an XLS spreadsheet.



Reply With Quote
Reply


Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump


All times are GMT -5. The time now is 21:47.

The Chief Delphi Forums are sponsored by Innovation First International, Inc.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © Chief Delphi