Theoretical Description¶
There are mainly three different ways to handle uncertainty quantification in binary classification: calibration (see Theoretical Description), confidence interval (CI) for the probability and prediction sets (see Theoretical Description). These 3 notions are tightly related for score-based classifier, as it is shown in [1].
Prediction sets can be computed in the same way for multiclass and binary classification with
MapieClassifier
, and there are the same theoretical guarantees.
Nevertheless, prediction sets are often much less informative in the binary case than in the multiclass case.
From Gupta et al [1]:
PSs and CIs are only ‘informative’ if the sets or intervals produced by them are small. To quantify this, we measure CIs using their width (denoted as , and PSs using their diameter (defined as the width of the convex hull of the PS). For example, in the case of binary classification, the diameter of a PS is if the prediction set is , and otherwise (since always holds, the set is ‘uninformative’). A short CI such as is more informative than a wider one such as .
In a few words, what you need to remember about these concepts :
Calibration is useful for transforming a score (typically given by an ML model) into the probability of making a good prediction.
Set Prediction gives the set of likely predictions with a probabilisic guarantee that the true label is in this set.
Probabilistic Prediction gives a confidence interval for the predictive distribution.
1. Set Prediction¶
- Definition 1 (Prediction Set (PS) w.r.t ) [1].
Fix a predictor and let . Define the set of all subsets of , . A function is said to be -PS with respect to if:
PSs are typically studied for larger output sets, such as or .
See MapieClassifier
to use a set predictor.
2. Probabilistic Prediction¶
- Definition 2 (Confidence Interval (CI) w.r.t ) [1].
Fix a predictor and let . Let denote the set of all subintervals of . A function is said to be -CI with respect to if:
In the framework of conformal prediction, the Venn predictor has this property.
3. Calibration¶
Usually, calibration is understood as perfect calibration meaning (see Theoretical Description). In practice, it is more reasonable to consider approximate calibration.
- Definition 3 (Approximate calibration) [1].
Fix a predictor and let . The predictor is -calibrated for some if with probability at least :
See CalibratedClassifierCV
or MapieCalibrator
to use a calibrator.
4. References¶
[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas. “Distribution-free binary classification: prediction sets, confidence intervals and calibration.” Advances in Neural Information Processing Systems 33 (2020): 3711-3723.