mapie.risk_control.BinaryClassificationController

class mapie.risk_control.BinaryClassificationController(predict_function: Callable[[ArrayLike], ndarray[tuple[Any, ...], dtype[_ScalarT]]], risk: BinaryClassificationRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | List[BinaryClassificationRisk] | List[Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']] | List[BinaryClassificationRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']], target_level: float | List[float], confidence_level: float = 0.9, best_predict_param_choice: Literal['auto'] | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | BinaryClassificationRisk = 'auto', list_predict_params: ndarray[tuple[Any, ...], dtype[_ScalarT]] = np.linspace(0, 0.99, 100))[source]

Controls the risk or performance of a binary classifier.

BinaryClassificationController finds the decision thresholds of a binary classifier that statistically guarantee a risk to be below a target level (the risk is “controlled”). It can be used to control a performance metric as well, such as the precision. In that case, the thresholds guarantee that the performance is above a target level.

Usage:

  1. Instantiate a BinaryClassificationController, providing the predict_proba method of your binary classifier

  2. Call the calibrate method to find the thresholds

  3. Use the predict method to predict using the best threshold

Note: for a given model, calibration dataset, target level, and confidence level, there may not be any threshold controlling the risk.

Parameters:
predict_functionCallable[[ArrayLike], NDArray]

predict_proba method of a fitted binary classifier. Its output signature must be of shape (len(X), 2).

Or, in the general case of multi-dimensional parameters (thresholds), a function that takes (X, *params) and outputs 0 or 1. This can be useful to e.g., ensemble multiple binary classifiers with different thresholds for each classifier. In that case, predict_params must be provided.

riskUnion[BinaryClassificationRisk, str, List[BinaryClassificationRisk, str]]

The risk or performance metric to control. Valid options:

  • An existing risk defined in mapie.risk_control accessible through its string equivalent: “precision”, “recall”, “accuracy”, “fpr” for false positive rate, or “predicted_positive_fraction”.

  • A custom instance of BinaryClassificationRisk object

Can be a list of risks in the case of multi risk control.

target_levelUnion[float, List[float]]

The maximum risk level (or minimum performance level). Must be between 0 and 1. Can be a list of target levels in the case of multi risk control (length should match the length of the risks list).

confidence_levelfloat, default=0.9

The confidence level with which the risk (or performance) is controlled. Must be between 0 and 1. See the documentation for detailed explanations.

best_predict_param_choiceUnion[“auto”, BinaryClassificationRisk, str],

default=”auto” How to select the best threshold from the valid thresholds that control the risk (or performance). The BinaryClassificationController will try to minimize (or maximize) a secondary objective. Valid options:

  • “auto” (default). For mono risk defined in mapie.risk_control, an automatic choice is made. For multi risk, we use the first risk in the list.

  • An existing risk defined in mapie.risk_control accessible through

    its string equivalent: “precision”, “recall”, “accuracy”, “fpr” for false positive rate, or “predicted_positive_fraction”.

  • A custom instance of BinaryClassificationRisk object

list_predict_paramsNDArray, default=np.linspace(0, 0.99, 100)

The set of parameters (noted λ in [1]) to consider for controlling the risk (or performance). When predict_function is a predict_proba method, the shape is (n_params,) and the parameter values are used to threshold the probabilities. When predict_function is a general function with multi-dimensional parameters (λ) that outputs 0 or 1, the shape is (n_params, params_dim). Note that performance is degraded when len(predict_params) is large as it is used by the Bonferroni correction [1].

Attributes:
valid_predict_paramsNDArray

The valid thresholds that control the risk (or performance). Use the calibrate method to compute these.

best_predict_paramOptional[Union[float, Tuple[float, …]]]

The best threshold that control the risk (or performance). It is a tuple if multi-dimensional parameters are used. Use the calibrate method to compute it.

p_valuesNDArray

P-values associated with each tested parameter in list_predict_params. In the multi-risk setting, the value corresponds to the maximum over the tested risks.

References

[1] Angelopoulos, Anastasios N., Stephen, Bates, Emmanuel J. Candès, et al. “Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control.” (2022)

Examples

>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from mapie.risk_control import BinaryClassificationController, precision
>>> X, y = make_classification(
...     n_features=2,
...     n_redundant=0,
...     n_informative=2,
...     n_clusters_per_class=1,
...     n_classes=2,
...     random_state=42,
...     class_sep=2.0
... )
>>> X_train, X_temp, y_train, y_temp = train_test_split(
...     X, y, test_size=0.4, random_state=42
... )
>>> X_calib, X_test, y_calib, y_test = train_test_split(
...     X_temp, y_temp, test_size=0.1, random_state=42
... )
>>> clf = LogisticRegression().fit(X_train, y_train)
>>> controller = BinaryClassificationController(
...     predict_function=clf.predict_proba,
...     risk=precision,
...     target_level=0.6
... )
>>> predictions = controller.calibrate(X_calib, y_calib).predict(X_test)
__init__(predict_function: Callable[[ArrayLike], ndarray[tuple[Any, ...], dtype[_ScalarT]]], risk: BinaryClassificationRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | List[BinaryClassificationRisk] | List[Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']] | List[BinaryClassificationRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']], target_level: float | List[float], confidence_level: float = 0.9, best_predict_param_choice: Literal['auto'] | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | BinaryClassificationRisk = 'auto', list_predict_params: ndarray[tuple[Any, ...], dtype[_ScalarT]] = np.linspace(0, 0.99, 100))[source]
calibrate(X_calibrate: ArrayLike, y_calibrate: ArrayLike) BinaryClassificationController[source]

Calibrate the BinaryClassificationController. Sets attributes valid_predict_params and best_predict_param (if the risk or performance can be controlled at the target level).

Parameters:
X_calibrateArrayLike

Features of the calibration set.

y_calibrateArrayLike

Binary labels of the calibration set.

Returns:
BinaryClassificationController

The calibrated controller instance.

predict(X_test: ArrayLike) ndarray[tuple[Any, ...], dtype[_ScalarT]][source]

Predict using predict_function at the best threshold.

Parameters:
X_testArrayLike

Features

Returns:
NDArray

NDArray of shape (n_samples,)

Raises:
ValueError

If the method .calibrate was not called, or if no valid thresholds were found during calibration.

Examples using mapie.risk_control.BinaryClassificationController

Precision control for a binary classifier

Precision control for a binary classifier

Control risk of a binary classifier with multiple prediction parameters

Control risk of a binary classifier with multiple prediction parameters

Control multiple risks of a binary classifier

Control multiple risks of a binary classifier

Risk Control for LLM as a Judge with Abstention

Risk Control for LLM as a Judge with Abstention