Tutorial for classification

In this tutorial, we compare the prediction sets estimated by the conformal methods implemented in :class:mapie.regression.MapieClassifier on a toy two-dimensional dataset.

Throughout this tutorial, we will answer the following questions:

  • How does the number of classes in the prediction sets vary according to the significance level ?

  • Is the chosen conformal method well calibrated ?

  • What are the pros and cons of the conformal methods included in :class:mapie.regression.MapieClassifier ?

1. Conformal Prediction method using the softmax score of the true label

We will use MAPIE to estimate a prediction set of several classes such that the probability that the true label of a new test point is included in the prediction set is always higher than the target confidence level : :math:P(Y \in C) \geq 1 - \alpha. We start by using the softmax score output by the base classifier as the conformity score on a toy two-dimensional dataset. We estimate the prediction sets as follows :

  • First we generate a dataset with train, calibration and test, the model is fitted on the training set.

  • We set the conformal score :math:S_i = \hat{f}(X_{i})_{y_i} the softmax output of the true class for each sample in the calibration set.

  • Then we define :math:\hat{q} as being the :math:(n + 1) (\alpha) / n previous quantile of :math:S_{1}, ..., S_{n} (this is essentially the quantile :math:\alpha, but with a small sample correction).

  • Finally, for a new test data point (where :math:X_{n + 1} is known but :math:Y_{n + 1} is not), create a prediction set :math:C(X_{n+1}) = \{y: \hat{f}(X_{n+1})_{y} > \hat{q}\} which includes all the classes with a sufficiently high softmax output.

We use a two-dimensional toy dataset with three labels. The distribution of the data is a bivariate normal with diagonal covariance matrices for each label.

import numpy as np
from sklearn.model_selection import train_test_split
centers = [(0, 3.5), (-2, 0), (2, 0)]
covs = [np.eye(2), np.eye(2)*2, np.diag([5, 1])]
x_min, x_max, y_min, y_max, step = -6, 8, -6, 8, 0.1
n_samples = 1000
n_classes = 3
X = np.vstack([
    np.random.multivariate_normal(center, cov, n_samples)
    for center, cov in zip(centers, covs)
y = np.hstack([np.full(n_samples, i) for i in range(n_classes)])
X_train_cal, X_test, y_train_cal, y_test = train_test_split(X, y, test_size=0.2)
X_train, X_cal, y_train, y_cal = train_test_split(X_train_cal, y_train_cal, test_size=0.25)

xx, yy = np.meshgrid(
    np.arange(x_min, x_max, step), np.arange(x_min, x_max, step)
X_test_mesh = np.stack([xx.ravel(), yy.ravel()], axis=1)

Let’s see our training data.

import matplotlib.pyplot as plt
colors = {0: "#1f77b4", 1: "#ff7f0e", 2:  "#2ca02c", 3: "#d62728"}
y_train_col = list(map(colors.get, y_train))
fig = plt.figure()
    X_train[:, 0],
    X_train[:, 1],

We fit our training data with a Gaussian Naive Base estimator. And then we apply :class:mapie.classification.MapieClassifier in the calibration data with the method score to the estimator indicating that it has already been fitted with cv="prefit". We then estimate the prediction sets with differents alpha values with a fit and predict process.

from sklearn.naive_bayes import GaussianNB
from mapie.classification import MapieClassifier
from mapie.metrics import classification_coverage_score, classification_mean_width_score
clf = GaussianNB().fit(X_train, y_train)
y_pred = clf.predict(X_test)
y_pred_proba = clf.predict_proba(X_test)
y_pred_proba_max = np.max(y_pred_proba, axis=1)
mapie_score = MapieClassifier(estimator=clf, cv="prefit", method="score")
mapie_score.fit(X_cal, y_cal)
alpha = [0.2, 0.1, 0.05]
y_pred_score, y_ps_score = mapie_score.predict(X_test_mesh, alpha=alpha)
  • y_pred_score: represents the prediction in the test set by the base estimator.

  • y_ps_score: the prediction sets estimated by MAPIE with the “score” method.

def plot_scores(n, alphas, scores, quantiles):
    colors = {0:"#1f77b4", 1:"#ff7f0e", 2:"#2ca02c"}
    fig = plt.figure(figsize=(7, 5))
    plt.hist(scores, bins="auto")
    for i, quantile in enumerate(quantiles):
             x = quantile,
             ls= "dashed",
             label=f"alpha = {alphas[i]}"
    plt.title("Distribution of scores")

Let’s see the distribution of the scores with the calculated quantiles.

scores = mapie_score.conformity_scores_
n = mapie_score.n_samples_val_
quantiles = mapie_score.quantiles_
plot_scores(n, alpha, scores, quantiles)

The estimated quantile increases with alpha. A high value of alpha can potentially lead to a high quantile which would not necessarily be reached by any class in uncertain areas, resulting in null regions.

We will now visualize the differences between the prediction sets of the different values of alpha.

def plot_results(alphas, X, y_pred, y_ps):
    tab10 = plt.cm.get_cmap('Purples', 4)
    colors = {0: "#1f77b4", 1: "#ff7f0e", 2:  "#2ca02c", 3: "#d62728"}
    y_pred_col = list(map(colors.get, y_pred))
    fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(10, 10))
    axs = {0: ax1, 1: ax2, 2:  ax3, 3: ax4}
        X[:, 0],
        X[:, 1],
    axs[0].set_title("Predicted labels")
    for i, alpha in enumerate(alphas):
        y_pi_sums = y_ps[:, :, i].sum(axis=1)
        num_labels = axs[i+1].scatter(
            X[:, 0],
            X[:, 1],
        cbar = plt.colorbar(num_labels, ax=axs[i+1])
        axs[i+1].set_title(f"Number of labels for alpha={alpha}")
plot_results(alpha, X_test_mesh, y_pred_score, y_ps_score)

When the class coverage is not large enough, the prediction sets can be empty when the model is uncertain at the border between two classes. The null region disappears for larger class coverages but ambiguous classification regions arise with several labels included in the prediction sets highlighting the uncertain behaviour of the base classifier.

Let’s now study the effective coverage and the mean prediction set widths as function of the :math:1-\alpha target coverage. To this aim, we use once again the .predict() method of :class:mapie.regression.MapieClassifier to estimate predictions sets on a large number of :math:\alpha values.

alpha2 = np.arange(0.02, 0.98, 0.02)
_, y_ps_score2 = mapie_score.predict(X_test, alpha=alpha2)
coverages_score = [
    classification_coverage_score(y_test, y_ps_score2[:, :, i])
    for i, _ in enumerate(alpha2)
widths_score = [
    classification_mean_width_score(y_ps_score2[:, :, i])
    for i, _ in enumerate(alpha2)
def plot_coverages_widths(alpha, coverage, width, method):
    fig, axs = plt.subplots(1, 2, figsize=(12, 5))
    axs[0].scatter(1 - alpha, coverage, label=method)
    axs[0].set_xlabel("1 - alpha")
    axs[0].set_ylabel("Coverage score")
    axs[0].plot([0, 1], [0, 1], label="x=y", color="black")
    axs[1].scatter(1 - alpha, width, label=method)
    axs[1].set_xlabel("1 - alpha")
    axs[1].set_ylabel("Average size of prediction sets")
plot_coverages_widths(alpha2, coverages_score, widths_score, "Score")

2. Conformal Prediction method using the cumulative softmax score

We saw in the previous section that the “score” method is well calibrated by providing accurate coverage levels. However, it tends to give null prediction sets for uncertain regions, especially when the :math:\alpha value is high. :class:mapie.classification.MapieClassifier includes another method, called Adaptive Prediction Set (APS), whose conformity score is the cumulated score of the softmax output until the true label is reached (see the theoretical description for more details). We will see in this Section that this method no longer estimates null prediction sets but by giving slightly bigger prediction sets.

Let’s visualize the prediction sets obtained with the APS method on the test set after fitting MAPIE on the calibration set.

mapie_aps = MapieClassifier(estimator=clf, cv="prefit", method="cumulated_score")
mapie_aps.fit(X_cal, y_cal)
alpha = [0.2, 0.1, 0.05]
y_pred_aps, y_ps_aps = mapie_aps.predict(X_test_mesh, alpha=alpha, include_last_label=True)
plot_results(alpha, X_test_mesh, y_pred_aps, y_ps_aps)

One can notice that the uncertain regions are emphasized by wider boundaries, but without null prediction sets with respect to the first “score” method.

_, y_ps_aps2 = mapie_aps.predict(X_test, alpha=alpha2, include_last_label="randomized")
coverages_aps = [
    classification_coverage_score(y_test, y_ps_aps2[:, :, i])
    for i, _ in enumerate(alpha2)
widths_aps = [
    classification_mean_width_score(y_ps_aps2[:, :, i])
    for i, _ in enumerate(alpha2)
plot_coverages_widths(alpha2, coverages_aps, widths_aps, "Score")

This method also gives accurate calibration plots, meaning that the effective coverage level is always very close to the target coverage, sometimes at the expense of slightly bigger prediction sets.