Theoretical Description¶
There are mainly three different ways to handle uncertainty quantification in binary classification: calibration (see Theoretical Description), confidence interval (CI) for the probability and prediction sets (see Theoretical Description). These 3 notions are tightly related for score-based classifier, as it is shown in [1].
Prediction sets can be computed in the same way for multiclass and binary classification with
MapieClassifier
, and there are the same theoretical guarantees.
Nevertheless, prediction sets are often much less informative in the binary case than in the multiclass case.
From Gupta et al [1]:
PSs and CIs are only ‘informative’ if the sets or intervals produced by them are small. To quantify this, we measure CIs using their width (denoted as , and PSs using their diameter (defined as the width of the convex hull of the PS). For example, in the case of binary classification, the diameter of a PS is if the prediction set is , and otherwise (since always holds, the set is ‘uninformative’). A short CI such as is more informative than a wider one such as .
In a few words, what you need to remember about these concepts :
Calibration is useful for transforming a score (typically given by an ML model) into the probability of making a good prediction.
Set Prediction gives the set of likely predictions with a probabilistic guarantee that the true label is in this set.
Probabilistic Prediction gives a confidence interval for the predictive distribution.
1. Set Prediction¶
- Definition 1 (Prediction Set (PS) w.r.t ) [1].
Fix a predictor and let . Define the set of all subsets of , . A function is said to be -PS with respect to if:
PSs are typically studied for larger output sets, such as or .
See MapieClassifier
to use a set predictor.
2. Probabilistic Prediction¶
- Definition 2 (Confidence Interval (CI) w.r.t ) [1].
Fix a predictor and let . Let denote the set of all subintervals of . A function is said to be -CI with respect to if:
3. Calibration¶
Usually, calibration is understood as perfect calibration meaning (see Theoretical Description). In practice, it is more reasonable to consider approximate calibration.
- Definition 3 (Approximate calibration) [1].
Fix a predictor and let . The predictor is -calibrated for some if with probability at least :
See CalibratedClassifierCV
or MapieCalibrator
to use a calibrator.
In the CP framework, it is worth noting that Venn predictors produce probability-type predictions for the labels of test objects which are guaranteed to be well-calibrated under the standard assumption that the observations are generated independently from the same distribution [2].
References¶
[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas. “Distribution-free binary classification: prediction sets, confidence intervals, and calibration.” Advances in Neural Information Processing Systems 33 (2020): 3711-3723.
[2] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic Learning in a Random World.” Springer Nature, 2022.