View Single Post
  Spotlight this post!  
Unread 09-04-2008, 19:18
Jay Lundy Jay Lundy is offline
Programmer/Driver 2001-2004
FRC #0254 (The Cheesy Poofs)
Team Role: Alumni
 
Join Date: Jun 2001
Rookie Year: 2001
Location: Berkeley, CA
Posts: 320
Jay Lundy is a name known to allJay Lundy is a name known to allJay Lundy is a name known to allJay Lundy is a name known to allJay Lundy is a name known to allJay Lundy is a name known to all
Re: Offensive Power Rankings for 2008

I noticed some people were requesting DPR scores. While the meaning of a DPR number isn't as straightforward as OPR, I think we may be able to improve the OPR calculation by taking it into account. If a team tends to play heavy defense, the teams they play against shouldn't have their OPR reduced when they play below average. Plus I love linear algebra so this gave me an excuse to use it.

<complex math warning>

So here's the equation:

Code:
( M -N ) ( p ) = ( s_t )
( N -M ) ( d ) = ( s_o )
Where (n = total # of teams):
M = n x n matrix with M(ij) = # of times i played with j. M(ii) = # of times i played. (same as M from before)
N = n x n matrix with N(ij) = # of times i played against j. N(ii) = 0.
p = n x 1 column vector of OPRs. p(i) = OPR for team i. (same as p from before)
d = n x 1 column vector of DPRs. d(i) = DPR for team i.
s_t = n x 1 column vector of total scores. s_t(i) = Sum of all of team i's match scores. (same as s from before)
s_o = n x 1 column vector of total opponent scores. s_o(i) = Sum of all of team i's opponents' match scores.

In other words, the first n equations add all the offense played by team i's allies, subtracts all the defense played by team i's opponents, and equates that with team i's total score. The second n equations sums all the offense played by team i's opponents, subtracts all the defense played by team i's allies, and equates it with team i's opponents' total score.

We can rewrite the equation as Ax = y where A = (M -N; N -M), x = (p; d), and y = (s_t; s_o).

In the data set I used, there are 2 isolated sets of teams that played no matches with teams outside their set: the Israeli and non-Israeli teams. We can separate these sets and write an equation for each one, and I think it's easier if we do:

Code:
A_1 * x_1 = y_1
A_2 * x_2 = y_2
We can solve each equation completely independently, so I'm just going to focus on one equation and call it Ax = y. A has a null space of dimension 1 so it's not invertible. We can increase all the OPRs and DPRs by the same amount without having any effect on the scores, so the null space is the span of x = (1 1 1 ... 1). We can get a unique solution by adding one more equation. I (somewhat arbitrarily) chose the equation by saying: if there was no defense, scores would be 25% higher. In equation form that is:

Code:
M(11)*p(1) + M(22)*p(2) + ... + M(nn)*p(n) = 1.25 * (sum(s_t) / 3)
or
Code:
( E 0 ) ( p ) = 1.25 * (sum(s_t) / 3)
        ( d )

E = ( M(11) M(22) ... M(nn) )
You can tack the last equation onto the end of A like so:
Code:
A = ( M -N )
    ( N -M )
    ( E  0 )
And just ask Matlab to solve Ax = y for you. Or replace a random row in A with ( E 0 ) so A becomes invertible and solve x = A_inv * y.

</complex math warning>

I ran this against the first csv Greg posted and here are the results (top 50, ordered by OPR):

Code:
Team   OPR      DPR      OPR + DPR
1114   71.6377  0.9474   72.5852
1124   53.2773  15.2694  68.5467
2056   51.7991  6.8767   58.6759
217    51.6703  13.4995  65.1698
233    51.5346  11.4470  62.9816
39     51.0832  4.8484   55.9316
330    50.1059  0.1292   50.2351
525    50.0129  1.2725   51.2855
175    47.6712  11.6710  59.3422
40     46.3240  10.1761  56.5001
1731   46.1172  -0.0044  46.1128
987    45.9656  7.0006   52.9662
103    45.0985  10.9291  56.0276
191    44.6783  12.1649  56.8432
79     44.1938  6.1087   50.3025
1024   43.9389  5.5522   49.4911
16     43.2490  6.0931   49.3421
67     43.2308  11.1761  54.4070
20     42.3096  6.3370   48.6466
469    41.9469  -5.4304  36.5165
494    41.2950  -1.9009  39.3941
1806   40.8038  5.2194   46.0232
365    40.6742  2.5704   43.2446
47     40.3067  -1.6335  38.6732
148    39.3002  9.0662   48.3663
1493   39.0307  -0.9984  38.0323
383    38.9932  5.5749   44.5681
1625   38.8912  5.1900   44.0813
1519   38.8147  6.3316   45.1463
1126   38.6616  0.8032   39.4648
141    38.6570  7.6568   46.3137
1718   38.4372  3.7425   42.1797
663    38.2419  14.4349  52.6767
126    37.8160  6.3778   44.1938
121    37.7131  12.7949  50.5080
195    37.7043  -1.5850  36.1192
1477   37.4595  10.8694  48.3289
368    37.1072  -2.7458  34.3614
25     37.0417  -3.1295  33.9121
1717   36.5859  6.3136   42.8995
71     36.1253  8.4642   44.5895
836    35.9330  6.4528   42.3859
93     35.5093  1.6895   37.1989
69     35.3987  2.3330   37.7317
61     35.3566  5.7965   41.1532
968    34.7835  4.2148   38.9984
2345   34.6020  1.6196   36.2216
1086   34.5104  10.6218  45.1321
58     34.4129  6.7595   41.1725
935    34.3941  7.7033   42.0974
Compare this with Guy's results from the same dataset. The results are fairly similar, but there's definitely some movement in the rankings. Make of it what you will.

Personally, I don't think it tells you a whole lot to know a team's DPR. The two OPRs tell you slightly different things about a team. The old OPR tries to tell you how much a team actually scored each match. The new OPR tries to tell you how much a team could have scored each match if there was no defense. They are both potentially useful numbers.

Finally, knowing both OPR and DPR does allow you to better predict the score of a match. If you define error as:
Code:
 error = actual_red_score - ( p(red1) + p(red2) + p(red3) - d(blue1) - d(blue2) - d(blue3) )
then both methods are the least squares solution for their respective vector spaces, but method #2 has a lower MSE (mean square error) and ME (mean (abs) error) because it has a bigger vector space (less information loss).

Method #1 MSE = 245.0446, ME = 12.306
Method #2 MSE = 180.8867, ME = 10.514

So it's better at predicting past scores. Is it better at predicting future scores? I guess we'll see.
Reply With Quote