Okay, I spent a bunch of time looking at the mean-reversion parameter and the results are extremely interesting. First, I tried running every 2-year period individually and found the best mean reversion just for that period. Here were the results:
Code:
2008-2009 35%
2009-2010 40%
2010-2011 40%
2011-2012 30%
2012-2013 30%
2013-2014 35%
2014-2015 35%
2015-2016 35%
The mean reversion was pretty high and relatively constant for all years.
Next, I found the best mean reversion for 2009 given 2008. Then I found the best mean reversion for 2010 given 2008 and 2009, and so on. In this way, each year would have a distinct mean reversion that builds off of the previous mean reversions. Here were the results:
Code:
2008-2009 35%
2009-2010 35%
2010-2011 30%
2011-2012 20%
2012-2013 20%
2013-2014 25%
2014-2015 30%
2015-2016 25%
These values start high, as in the previous case, but they seem to drop after a while as the model learns more about the teams.
Finally, I compared how predictive the previous model was in comparison to my original 20% for all years, the results are attached.
Interestingly, adjusting the mean reversion every year actually fares worse overall than just using 20% every year, even if you throw out 2015 and 2016 because 2015 was an outlier year in many respects. I think the reason for this is because team performance 2 years in the future can still be reasonably well predicted by a current season's performance. The constantly updating model seems to put the mean reversion parameter too high to fully account for this 2 year explanatory effect.