`mapie.risk_control`.BinaryClassificationController¶

class mapie.risk_control.BinaryClassificationController(predict_function: Callable[[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]], ndarray[Any, dtype[_ScalarType_co]]], risk: BinaryClassificationRisk, target_level: float, confidence_level: float = 0.9, best_predict_param_choice: Union[Literal['auto'], BinaryClassificationRisk] = 'auto')[source]¶

Controls the risk or performance of a binary classifier.

BinaryClassificationController finds the decision thresholds of a binary classifier that statistically guarantee a risk to be below a target level (the risk is “controlled”). It can be used to control a performance metric as well, such as the precision. In that case, the thresholds guarantee that the performance is above a target level.

Usage:

Instantiate a BinaryClassificationController, providing the predict_proba method of your binary classifier
Call the calibrate method to find the thresholds
Use the predict method to predict using the best threshold

Note: for a given model, calibration dataset, target level, and confidence level, there may not be any threshold controlling the risk.

Parameters

predict_functionCallable[[ArrayLike], NDArray]

predict_proba method of a fitted binary classifier. Its output signature must be of shape (len(X), 2)

riskBinaryClassificationRisk

The risk or performance metric to control. Valid options:

An existing risk defined in mapie.risk_control (e.g. precision, recall, accuracy, false_positive_rate)
A custom instance of BinaryClassificationRisk object

target_levelfloat

The maximum risk level (or minimum performance level). Must be between 0 and 1.

confidence_levelfloat, default=0.9

The confidence level with which the risk (or performance) is controlled. Must be between 0 and 1. See the documentation for detailed explanations.

best_predict_param_choiceUnion[“auto”, BinaryClassificationRisk], default=”auto”

How to select the best threshold from the valid thresholds that control the risk (or performance). The BinaryClassificationController will try to minimize (or maximize) a secondary objective. Valid options:

“auto” (default)
An existing risk defined in mapie.risk_control (e.g. precision, recall, accuracy, false_positive_rate)
A custom instance of BinaryClassificationRisk object

References

Angelopoulos, Anastasios N., Stephen, Bates, Emmanuel J. Candès, et al. “Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control.” (2022)

Examples

>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from mapie.risk_control import BinaryClassificationController, precision

>>> X, y = make_classification(
...     n_features=2,
...     n_redundant=0,
...     n_informative=2,
...     n_clusters_per_class=1,
...     n_classes=2,
...     random_state=42,
...     class_sep=2.0
... )
>>> X_train, X_temp, y_train, y_temp = train_test_split(
...     X, y, test_size=0.4, random_state=42
... )
>>> X_calib, X_test, y_calib, y_test = train_test_split(
...     X_temp, y_temp, test_size=0.1, random_state=42
... )

>>> clf = LogisticRegression().fit(X_train, y_train)

>>> controller = BinaryClassificationController(
...     predict_function=clf.predict_proba,
...     risk=precision,
...     target_level=0.6
... )

>>> controller.calibrate(X_calib, y_calib)
>>> predictions = controller.predict(X_test)

Attributes

valid_predict_paramsNDArray: The valid thresholds that control the risk (or performance). Use the calibrate method to compute these.
best_predict_paramOptional[float]: The best threshold that control the risk (or performance). Use the calibrate method to compute it.

__init__(predict_function: Callable[[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]], ndarray[Any, dtype[_ScalarType_co]]], risk: BinaryClassificationRisk, target_level: float, confidence_level: float = 0.9, best_predict_param_choice: Union[Literal['auto'], BinaryClassificationRisk] = 'auto')[source]¶

calibrate(X_calibrate: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y_calibrate: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) → None[source]¶

Calibrate the BinaryClassificationController. Sets attributes valid_predict_params and best_predict_param (if the risk or performance can be controlled at the target level).

Parameters

X_calibrateArrayLike: Features of the calibration set.
y_calibrateArrayLike: Binary labels of the calibration set.

Returns

None

predict(X_test: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) → ndarray[Any, dtype[_ScalarType_co]][source]¶

Predict using predict_function at the best threshold.

Parameters

X_testArrayLike: Features

Returns

NDArray: NDArray of shape (n_samples,)

Raises

ValueError: If the method .calibrate was not called, or if no valid thresholds were found during calibration.

Examples using `mapie.risk_control.BinaryClassificationController`¶

Use MAPIE to control the precision of a binary classifier¶

mapie.risk_control.BinaryClassificationController¶

Examples using mapie.risk_control.BinaryClassificationController¶

`mapie.risk_control`.BinaryClassificationController¶

Examples using `mapie.risk_control.BinaryClassificationController`¶