Caleb explained it very clearly in the previous post.
If you need more detail, this post shows a simple AWK script for creating the necessary matrix and column vectors, and a simple Octave script for doing the linear algebra.
Caleb explained it very clearly in the previous post.
If you need more detail, this post shows a simple AWK script for creating the necessary matrix and column vectors, and a simple Octave script for doing the linear algebra.
Also note the following: in Caleb’s discussion, [A] is a non-square binary matrix (each element is either 1 or 0) with 2M rows and T columns, where M is the number of matches and T is the number of teams… and ** is a 2M by 1 column vector of alliance scores. So [A]≈** is indeed an overdetermined system which has no exact solution.
But if you left-multiply each side of [A]≈** by [AT], you get
[N]=[d], where
[N] is [AT][A] and
[d] is [AT]**
[N] will be an invertible square symmetric positive definite matrix, and [N]=[d] will have an exact solution because it’s already in the Normalized Equations form. The exact solution to [N]=[d] will be the least-squares approximate solution to [A]≈** (to within computer floating-point rounding error).
You can generate [N] (and [d]) directly from the raw score data, but it’s more straightforward to generate the binary matrix [A] in sparse form.
Sorry if I was unclear, the matrix I am using is the square matrix from performing the lease squares estimate of the solution. To use your variables, I am solving A’Ax=A’b, and having trouble with A’A being diagonally dominant.
Post your raw data and I will post an OPR solution with all the intermediate matrices and column vectors so you can locate where the error is.
Are you pulling the list of teams at the event for the event from a separate request (i.e. not deriving it from from parsed match data)? If so, you may have to go through your list of teams and eliminate no-shows, because they will appear on the TBA (assuming you are using TBA data) teams list, but not in any matches, causing the row/column for that no-show team in the original match matrix to be all zeroes, making Ax = b underdetermined and thereby unsolvable. This would also lead to your problem of A’A being a singular matrix, and therefore not strictly diagonally dominant.
Here’s a link to the raw data (each row is an alliance) and matrix. I’m not worried about the score vector, that is very straightforward. Team data comes from a master spreadsheet used for scouting at the event, checked manually against TBA. I did check the sums of the rows and the values matched what I expected.
Thanks for the help!
OK:
Your “Teams” tab has data for only 44 alliances, which would generate 44 linear equations.
But you have 64 teams.
So you have more unknowns (Team OPRs) than equations.
So the linear system is underdetermined, and has no unique solution. There are an infinite number of solutions.
There are at least three ways you could deal with this.
Wait until you have enough data so you have an overdetermined system and there is a unique least-squares solution
Solve the system using the Moore-Penrose pseudo-inverse: x=pinv(A)*b. This will select the solution x which has minimum ||x||
Just compute each team’s average (instead of OPR) until you have enough data: if a team played on L alliances, sum those alliance scores and divide the sum by 3L.
Another (very impractical) way to deal with this could be to selectively remove matches from the data set until an overdetermined system is created. This could potentially find OPRs for a subset of teams. This is wholly impractical for official FRC events since the scheduler highly prioritizes unique partners, so I doubt a combination of n matches could be found which would also remove >n teams.
*@ngreen: I think you might be misunderstanding the problem at hand. Dan’s computation is failing because he is trying to compute OPR with insufficient data, not because he’s using the Normal Equations approach (square matrix).
I suspect what he is trying to do is get “real time” OPR estimates during the early stages of an event, before enough matches have been played to create an overdetermined system suitable for least-squares estimation.
I think you are right.
Real time OPR calcs, like the ones available on FRC Spyder, are not very helpful before ~5 matches are played. Even then I don’t rely on them before about 8 matches. This is one of the many nice things about playing in a district system – we get useful OPR data for our Friday evening scouting discussions.
I don’t see how what worked OK for you is related to the problem Dan posted.
Take the following alliance data that Dan posted:
1700 5677 2489
1868 256 4171
254 852 4990
2035 6036 2643
841 846 5027
114 5026 6039
1678 5940 2473
100 5171 581
3303 5089 1351
5700 1967 1280
3482 6059 5655
670 253 199
8 5924 4643
192 2854 5905
668 4186 4765
5728 5104 2813
368 4904 2367
1662 2135 972
604 5737 766
649 4255 4159
971 1868 115
751 5700 3256
852 5924 6039
581 5655 5089
2473 6036 3303
4186 5905 841
100 2854 846
1967 972 6059
649 2643 5728
1678 8 2813
1700 1662 5104
254 971 4255
256 4643 766
668 5026 253
192 199 2367
5940 604 3256
115 4171 5027
4765 4904 3482
114 1280 5171
751 2489 4990
368 670 2035
1351 5737 5677
6039 4159 2643
2135 4186 5924
… plus this example alliance score vector:
54
59
115
45
89
49
111
47
78
45
67
79
121
57
119
52
93
87
63
40
80
87
115
34
41
94
31
102
40
78
55
98
120
106
102
121
123
68
120
108
62
49
130
46
… and try to run your OPR computation on that data.
Ether is not referring to lack of accuracy of results when there are not enough data. He is saying there has to be a minimum number of matches before you can even get an answer.
Thanks for all the replies. Adding more data did the trick. To clarify, my mistake was assuming that the converse of “a diagonally dominant matrix is invertible” was true.
Eugene Fang and I have been PM’ing about this. He came up with a clever and very effective method. See his discussion here](Match Result Predictions by fangeugene · Pull Request #1478 · the-blue-alliance/the-blue-alliance · GitHub).
The method looks good but I am not sure if it is necessary. Here is my position on how to use OPR. Current event OPR should not be used until you have at least 4 or 5 matches depending on number of teams. Before that, I use max OPR from prior events. So in a way, it is similar to what Eugene Fang did. The match “prediction” using OPR is just a check. Since you are at the event, you should always use scouted data instead of OPR.
The work Eugene Fang did was great in showing how long it takes to converge.
Are the xOPR and ixOPR calculations only possible if every attending team has competed at least once? If so, that severely limits the usefulness of this effort, because this criterion is not met for a large majority of events.
Obviously xOPR and ixOPR are possible without this constraint, since Eugene posted convergence graphs for Palmetto. What is the “other source for OPR initialization” that is used for cases like this?
I don’t know what Eugene used.
I haven’t done any numerical experiments yet to confirm this, but if there is no prior OPR value for a given team at an event, I think you can just pick a reasonable number to seed the calculation, and let the 2x iteration take care of things from there.
*Here is the basis/motivation for the ixOPR algorithm. Note how many matches must be played before the OPR stabilizes.
What Ether said shares my view on the initialization.
In the graphs, I used 0 as the initialization, which I knew was a bad idea at the time. I’ve been trying out using the average match score for the event (up to the currently played match) as the initialization for teams without past events with better* results.