Theoretical Description¶
There are mainly three different ways to handle uncertainty quantification in binary classification:
calibration (see Theoretical Description), confidence interval (CI) for the probability
and prediction sets (see Theoretical Description).
These 3 notions are tightly related for score-based classifier, as it is shown in [1].
Prediction sets can be computed in the same way for multiclass and binary classification with
MapieClassifier, and there are the same theoretical guarantees.
Nevertheless, prediction sets are often much less informative in the binary case than in the multiclass case.
From Gupta et al [1]:
PSs and CIs are only ‘informative’ if the sets or intervals produced by them are small. To quantify this, we measure CIs using their width (denoted as
, and PSs using their diameter (defined as the width of the convex hull of the PS). For example, in the case of binary classification, the diameter of a PS is
if the prediction set is
, and
otherwise (since
always holds, the set
is ‘uninformative’). A short CI such as
is more informative than a wider one such as
.
In a few words, what you need to remember about these concepts :
Calibration is useful for transforming a score (typically given by an ML model) into the probability of making a good prediction.
Set Prediction gives the set of likely predictions with a probabilisic guarantee that the true label is in this set.
Probabilistic Prediction gives a confidence interval for the predictive distribution.
1. Set Prediction¶
- Definition 1 (Prediction Set (PS) w.r.t
) [1]. Fix a predictor
and let
.
Define the set of all subsets of
,
.
A function
is said to be
-PS with respect to
if:

PSs are typically studied for larger output sets, such as
or
.
See MapieClassifier to use a set predictor.
2. Probabilistic Prediction¶
- Definition 2 (Confidence Interval (CI) w.r.t
) [1]. Fix a predictor
and let
.
Let
denote the set of all subintervals of
.
A function
is said to be
-CI with respect to
if:
![P(\mathbb{E}[Y|\hat{\mu}(X)]\in C(\hat{\mu}(X))) \geq 1 - \alpha](_images/math/b61a10f08075d2ccdf2781a4d3bb206f42efcb46.png)
3. Calibration¶
Usually, calibration is understood as perfect calibration meaning (see Theoretical Description). In practice, it is more reasonable to consider approximate calibration.
- Definition 3 (Approximate calibration) [1].
Fix a predictor
and let
.
The predictor
is
-calibrated
for some
if with probability at least
:
![|\mathbb{E}[Y|\hat{\mu}(X)] - \hat{\mu}(X)| \leq \epsilon](_images/math/8b8e295ddfe71b0e02995480d791df8ea4274f2a.png)
See CalibratedClassifierCV or MapieCalibrator
to use a calibrator.
In the CP framework, it is worth noting that Venn predictors produce probability-type predictions for the labels of test objects which are guaranteed to be well calibrated under the standard assumption that the observations are generated independently from the same distribution [2].
4. References¶
[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas. “Distribution-free binary classification: prediction sets, confidence intervals and calibration.” Advances in Neural Information Processing Systems 33 (2020): 3711-3723.
[2] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic Learning in a Random World.” Springer Nature, 2022.
, and PSs using their diameter (defined as
the width of the convex hull of the PS). For example, in the case of binary classification, the diameter
of a PS is
if the prediction set is
, and
otherwise (since
always holds, the set
is more informative than a wider one such as
.