Hereâ€™s a summary of how well my predictions worked out this year. Thereâ€™s a section by week and then overall. Within each section there is a table that gives an idea of how well-calibrated the results are.

The entry under â€śMADâ€ť is the average difference between the prediction and the final. If one were to just flip a coin for each team, this would be .5, or if you were to just guess that this team had a 50% change to make it in it would also be around .5. The entry under â€śrmsâ€ť is also known as the Brier score.

```
mar10
Predicted Observed Samples
0<x<10 0.0 5
10<=x<20 0.09 34
20<=x<30 0.28 18
30<=x<40 0.34 53
40<=x<50 0.71 7
50<=x<60 0.83 6
60<=x<70 0.64 11
70<=x<80 1.0 4
80<=x<90 1.0 2
90<=x<100 1.0 6
x=100 1.0 9
mad: 0.336321935484
rms: 0.400023810824
mar17
Predicted Observed Samples
x=0 0.01 234
0<x<10 0.01 205
10<=x<20 0.06 262
20<=x<30 0.24 291
30<=x<40 0.34 147
40<=x<50 0.53 160
50<=x<60 0.65 98
60<=x<70 0.84 55
70<=x<80 0.92 50
80<=x<90 0.97 32
90<=x<100 0.96 47
x=100 1.0 248
mad: 0.22786440678
rms: 0.323887001132
mar24
Predicted Observed Samples
x=0 0.0 484
0<x<10 0.03 186
10<=x<20 0.08 222
20<=x<30 0.25 170
30<=x<40 0.34 93
40<=x<50 0.51 73
50<=x<60 0.8 54
60<=x<70 0.79 43
70<=x<80 0.98 43
80<=x<90 0.91 22
90<=x<100 0.95 43
x=100 0.98 396
mad: 0.15088354292
rms: 0.267576994126
mar31
Predicted Observed Samples
x=0 0.01 848
0<x<10 0.08 111
10<=x<20 0.06 88
20<=x<30 0.4 58
30<=x<40 0.38 29
40<=x<50 0.4 30
50<=x<60 0.76 21
60<=x<70 0.73 11
70<=x<80 0.85 13
80<=x<90 0.8 5
90<=x<100 0.92 24
x=100 0.97 591
mad: 0.0734882449426
rms: 0.210646096619
Overall:
Predicted Observed Samples
x=0 0.01 1566
0<x<10 0.04 507
10<=x<20 0.07 606
20<=x<30 0.26 537
30<=x<40 0.34 322
40<=x<50 0.51 270
50<=x<60 0.72 179
60<=x<70 0.79 120
70<=x<80 0.94 110
80<=x<90 0.93 61
90<=x<100 0.95 120
x=100 0.98 1244
mad: 0.155843654732
rms: 0.275676432301
```

Overall, the results are way ahead of random, but Iâ€™d like to have a better baseline to compare against.

Thereâ€™s definitely room for improvement here because it should be possible to get the 0% and 100% bins to be right all of the time, and the calibration shows that thereâ€™s an underestimation of teams that are doing well and overestimation of teams that are doing poorly, part of which could be fixed by a model of team skill, or by just adding some calibration tables.