Variance
Variance measures the average value of the squared deviation from the mean. The instance with the lowest Variance is considered the most informative. Therefore, labeling this instance can be very useful to our model.
- If the Variance is 0, then the model doesn’t have a clue about the correct label.
- If the Variance is 1, then the model has a clear “belief” about the correct label.
Imagine we have only 2 instances – A and B –, and the model has to decide which of them to suggest for annotation. It has made the following class predictions:
- Instance A: “cat” – 0.5, “milkshake” – 0.45, “cloud” – 0.05.
- Instance B: “cat” – 0.4, “milkshake” – 0.3, “cloud” – 0.3.
In this case, our model will choose instance B over A, as 0.0067 is less than 0.0406:
Learn more about the other heuristics: