Shannon defined the mutual information between two variables. We illustrate
why the true mutual information between a variable and the predictions made by
a prediction algorithm is not a suitable measure of prediction quality, but the
apparent Shannon mutual information (ASI) is; indeed it is the unique
prediction quality measure with either of two very different lists of desirable
properties, as previously shown by de Finetti and other authors. However,
estimating the uncertainty of the ASI is a difficult problem, because of long
and non-symmetric heavy tails to the distribution of the individual values of
j(x,y)=logP(x)Qy(x) We propose a Bayesian modelling method for the
distribution of
j(x,y), from the posterior distribution of which the
uncertainty in the ASI can be inferred. This method is based on Dirichlet-based
mixtures of skew-Student distributions. We illustrate its use on data from a
Bayesian model for prediction of the recurrence time of prostate cancer. We
believe that this approach is generally appropriate for most problems, where it
is infeasible to derive the explicit distribution of the samples of
j(x,y),
though the precise modelling parameters may need adjustment to suit particular
cases.