Theoretical Description¶
There are mainly three different ways to handle uncertainty quantification in binary classification:
calibration (see Theoretical Description), confidence interval (CI) for the probability
and prediction sets (see Theoretical Description).
These 3 notions are tightly related for score-based classifier, as it is shown in [1].
Prediction sets can be computed in the same way for multiclass and binary classification with
MapieClassifier, and there are the same theoretical guarantees.
Nevertheless, prediction sets are often much less informative in the binary case than in the multiclass case.
From Gupta et al [1]:
PSs and CIs are only ‘informative’ if the sets or intervals produced by them are small. To quantify this, we measure CIs using their width (denoted as
, and PSs using their diameter (defined as the width of the convex hull of the PS). For example, in the case of binary classification, the diameter of a PS is
if the prediction set is
, and
otherwise (since
always holds, the set
is ‘uninformative’). A short CI such as
is more informative than a wider one such as
.
In a few words, what you need to remember about these concepts :
Calibration is useful for transforming a score (typically given by an ML model) into the probability of making a good prediction.
Set Prediction gives the set of likely predictions with a probabilisic guarantee that the true label is in this set.
Probabilistic Prediction gives a confidence interval for the predictive distribution.
1. Set Prediction¶
- Definition 1 (Prediction Set (PS) w.r.t
) [1]. Fix a predictor
and let
.
Define the set of all subsets of
,
.
A function
is said to be
-PS with respect to
if:

PSs are typically studied for larger output sets, such as
or
.
See MapieClassifier to use a set predictor.
2. Probabilistic Prediction¶
- Definition 2 (Confidence Interval (CI) w.r.t
) [1]. Fix a predictor
and let
.
Let
denote the set of all subintervals of
.
A function
is said to be
-CI with respect to
if:
![P(\mathbb{E}[Y|\hat{\mu}(X)]\in C(\hat{\mu}(X))) \geq 1 - \alpha](_images/math/b61a10f08075d2ccdf2781a4d3bb206f42efcb46.png)
In the framework of conformal prediction, the Venn predictor has this property.
3. Calibration¶
Usually, calibration is understood as perfect calibration meaning (see Theoretical Description). In practice, it is more reasonable to consider approximate calibration.
- Definition 3 (Approximate calibration) [1].
Fix a predictor
and let
.
The predictor
is
-calibrated
for some
if with probability at least
:
![|\mathbb{E}[Y|\hat{\mu}(X)] - \hat{\mu}(X)| \leq \epsilon](_images/math/8b8e295ddfe71b0e02995480d791df8ea4274f2a.png)
See CalibratedClassifierCV or MapieCalibrator
to use a calibrator.
4. References¶
[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas. “Distribution-free binary classification: prediction sets, confidence intervals and calibration.” Advances in Neural Information Processing Systems 33 (2020): 3711-3723.
, and PSs using their diameter (defined as
the width of the convex hull of the PS). For example, in the case of binary classification, the diameter
of a PS is
if the prediction set is
, and
otherwise (since
always holds, the set
is more informative than a wider one such as
.