Quote:
Originally Posted by Ether
The dust seems to have settled in this thread so I thought I'd toss this out for discussion.
|
Neat idea!
I think this is essentially minimizing another "mixture of errors" like the original EPR proposal. In this case, you're minimizing the sum of the standard error measure that leads to your normal parameters plus the squared error between the parameters and your a priori expectations of them, with a weighting factor added in to adjust the importance of the original measure and your new second measure. If you form your error measure this way, then take the derivative of the error squared, set the result equal to zero, and solve for the parameters, you end up with the equation that you're solving.
That's another way to incorporate a priori expectations of the means, like the MMSE estimates do. And if you keep the weights constant, it would progressively go from a priori info at the start of a tournament to incorporating mainly match info later in a tournament, again like the MMSE estimates do. About the only thing that would be hard to do using this method would be incorporating a priori variance estimates, but then I suspect that this "feature" of the MMSE estimates has limited practical utility anyway.
Did you do any testing of this method to see how it does and how well it compares?
If you were so inclined, I'd love to see a database with the following data for Championships for the past 3 years:
. Ar, Ab, Mr, Mb for the CMP divisions (which we already have because you already generated them: thanks again!)
. OPR, OPRm1, and OPRm3 for all of the regional tournaments that all of the CMP teams played in that year, and perhaps the mean and variance of each of these statistics at the respective tournaments. I'm calling OPRm1 and OPRm3 the MMSE estimates with Var(N)/Var(O) estimated at 1 and 3 respectively.
Ideally, there might be csv files with a row for each team in each CMP division (in the same order as the columns of the corresponding CMP Ar and Ab matrices) and columns with a 0 if the team didn't play in that week or a number if the team did play (e.g., OPR, or mean OPR for that tournament that week, etc). So, for example, for 2014 archi, there might be the following files, each containing 100 rows (one per team) by 7 columns (one per week):
ARCHI_OPR
ARCHI_OPRm1
ARCHI_OPRm3
ARCHI_OPR_mean
ARCHI_OPRm1_mean
ARCHI_OPRm3_mean
ARCHI_OPR_var
ARCHI_OPRm1_var
ARCHI_OPRm3_var
Any chance you'd be up for generating this data?
Given this data, we could test out different methods for predicting the match outcomes of the CMP divisions. I suggest that it would be interesting to try to predict
A. The results of CMP matches completely based off the results from previous tournaments.
B. The results of the last 3/4 of the CMP matches based off the results from previous tournaments AND the first 1/4 of the CMP matches.
C. The results of the last 1/2 of the CMP matches based off the results from previous tournaments AND the first 1/2 of the CMP matches.