I re-ran my experiments. I found a few problems:
- I was incorrectly weighting precision and recall scores based on class. I believe it was a holdover from earlier iterations of experiments where I was exploring nonbinary classifications.
- Based on these new numbers, it’s pretty clear there was an issue computing COPRs using matches in the future in my previous experiments. My new Brier scores are far higher (worse). Similarly, my accuracy, precision, and recall are far lower (also worse).
To tell you the truth, I’m actually pretty embarassed I didn’t catch these errors sooner – and I want to apologize for not more rigorously validating my results. I have a few trusted peers in the community review my CD papers, but I don’t have this same policy for personal blog posts.
I currently believe that the model is learning but is overfitting (results available below). I still think there’s potential here, but I don’t have more time to debug things. In the meantime, I’ve unpublished the blog post and edited my original post in this thread.
2019
[win train] brier: 0.0
[win train] accuracy: 1.0
[win train] precision: 1.0
[win train] recall: 1.0
[win train] confusion:
array([[226908, 0],
[ 0, 217044]])
[win test] brier: 0.28642322097378276
[win test] accuracy: 0.7135767790262172
[win test] precision: 0.7177978983543718
[win test] recall: 0.6888001014713343
[win test] confusion:
array([[12002, 4270],
[ 4907, 10861]])
[hab train] brier: 0.0
[hab train] accuracy: 1.0
[hab train] precision: 1.0
[hab train] recall: 1.0
[hab train] confusion:
array([[313992, 0],
[ 0, 129960]])
[hab test] brier: 0.24079275905118602
[hab test] accuracy: 0.759207240948814
[hab test] precision: 0.7346512235478534
[hab test] recall: 0.6437728937728938
[hab test] confusion:
array([[15889, 3047],
[ 4668, 8436]])
[rkt train] brier: 0.0
[rkt train] accuracy: 1.0
[rkt train] precision: 1.0
[rkt train] recall: 1.0
[rkt train] confusion:
array([[433368, 0],
[ 0, 10584]])
[rkt test] brier: 0.05811485642946317
[rkt test] accuracy: 0.9418851435705369
[rkt test] precision: 0.16758241758241757
[rkt test] recall: 0.037654320987654324
[rkt test] confusion:
array([[30117, 303],
[ 1559, 61]])
2018
[win train] brier: 0.0
[win train] accuracy: 1.0
[win train] precision: 1.0
[win train] recall: 1.0
[win train] confusion:
array([[212328, 0],
[ 0, 211824]])
[win test] brier: 0.2878498727735369
[win test] accuracy: 0.7121501272264631
[win test] precision: 0.7145939725239157
[win test] recall: 0.7040107709750567
[win test] confusion:
array([[10216, 3968],
[ 4177, 9935]])
[auto train] brier: 0.0
[auto train] accuracy: 1.0
[auto train] precision: 1.0
[auto train] recall: 1.0
[auto train] confusion:
array([[276228, 0],
[ 0, 147924]])
[auto test] brier: 0.39351851851851855
[auto test] accuracy: 0.6064814814814815
[auto test] precision: 0.7225664092336417
[auto test] recall: 0.5197802197802198
[auto test] confusion:
array([[8647, 3269],
[7866, 8514]])
[climb train] brier: 0.0
[climb train] accuracy: 1.0
[climb train] precision: 1.0
[climb train] recall: 1.0
[climb train] confusion:
array([[407520, 0],
[ 0, 16632]])
[climb test] brier: 0.046296296296296294
[climb test] accuracy: 0.9537037037037037
[climb test] precision: 0.5316091954022989
[climb test] recall: 0.1388888888888889
[climb test] confusion:
array([[26801, 163],
[ 1147, 185]])
2017
[win train] brier: 0.0
[win train] accuracy: 1.0
[win train] precision: 1.0
[win train] recall: 1.0
[win train] confusion:
array([[195444, 0],
[ 0, 188028]])
[win test] brier: 0.371354543263437
[win test] accuracy: 0.628645456736563
[win test] precision: 0.6293059453942332
[win test] recall: 0.6022588522588522
[win test] confusion:
array([[11001, 5811],
[ 6515, 9865]])
[kpa train] brier: 0.0
[kpa train] accuracy: 1.0
[kpa train] precision: 1.0
[kpa train] recall: 1.0
[kpa train] confusion:
array([[379908, 0],
[ 0, 3564]])
[kpa test] brier: 0.01654013015184382
[kpa test] accuracy: 0.9834598698481561
[kpa test] precision: 0.2289156626506024
[kpa test] recall: 0.037698412698412696
[kpa test] confusion:
array([[32624, 64],
[ 485, 19]])
[rotor train] brier: 0.0
[rotor train] accuracy: 1.0
[rotor train] precision: 1.0
[rotor train] recall: 1.0
[rotor train] confusion:
array([[381096, 0],
[ 0, 2376]])
[rotor test] brier: 0.04775247047481321
[rotor test] accuracy: 0.9522475295251868
[rotor test] precision: 0.3323076923076923
[rotor test] recall: 0.07317073170731707
[rotor test] confusion:
array([[31499, 217],
[ 1368, 108]])