# Theoretical Description¶

There are mainly three different ways to handle uncertainty quantification in binary classification: calibration (see Theoretical Description), confidence interval (CI) for the probability and prediction sets (see Theoretical Description). These 3 notions are tightly related for score-based classifier, as it is shown in [1].

Prediction sets can be computed in the same way for multiclass and binary classification with
`MapieClassifier`

, and there are the same theoretical guarantees.
Nevertheless, prediction sets are often much less informative in the binary case than in the multiclass case.

From Gupta et al [1]:

PSs and CIs are only ‘informative’ if the sets or intervals produced by them are small. To quantify this, we measure CIs using their width (denoted as , and PSs using their diameter (defined as the width of the convex hull of the PS). For example, in the case of binary classification, the diameter of a PS is if the prediction set is , and otherwise (since always holds, the set is ‘uninformative’). A short CI such as is more informative than a wider one such as .

In a few words, what you need to remember about these concepts :

*Calibration*is useful for transforming a score (typically given by an ML model) into the probability of making a good prediction.*Set Prediction*gives the set of likely predictions with a probabilistic guarantee that the true label is in this set.*Probabilistic Prediction*gives a confidence interval for the predictive distribution.

## 1. Set Prediction¶

- Definition 1 (Prediction Set (PS) w.r.t ) [1].
Fix a predictor and let . Define the set of all subsets of , . A function is said to be -PS with respect to if:

PSs are typically studied for larger output sets, such as or .

See `MapieClassifier`

to use a set predictor.

## 2. Probabilistic Prediction¶

- Definition 2 (Confidence Interval (CI) w.r.t ) [1].
Fix a predictor and let . Let denote the set of all subintervals of . A function is said to be -CI with respect to if:

## 3. Calibration¶

Usually, calibration is understood as perfect calibration meaning (see Theoretical Description). In practice, it is more reasonable to consider approximate calibration.

- Definition 3 (Approximate calibration) [1].
Fix a predictor and let . The predictor is -calibrated for some if with probability at least :

See `CalibratedClassifierCV`

or `MapieCalibrator`

to use a calibrator.

In the CP framework, it is worth noting that Venn predictors produce probability-type predictions for the labels of test objects which are guaranteed to be well-calibrated under the standard assumption that the observations are generated independently from the same distribution [2].

## References¶

[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas. “Distribution-free binary classification: prediction sets, confidence intervals, and calibration.” Advances in Neural Information Processing Systems 33 (2020): 3711-3723.

[2] Vovk, Vladimir, Alexander Gammerman, and Glenn Shafer. “Algorithmic Learning in a Random World.” Springer Nature, 2022.