`mapie.risk_control`.BinaryClassificationController

class mapie.risk_control.BinaryClassificationController(predict_function: Callable[[ArrayLike], ndarray[tuple[Any, ...], dtype[_ScalarT]]], risk: BinaryRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | List[BinaryRisk] | List[Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']] | List[BinaryRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']], target_level: float | List[float], confidence_level: float = 0.9, best_predict_param_choice: Literal['auto'] | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | BinaryRisk = 'auto', list_predict_params: ndarray[tuple[Any, ...], dtype[_ScalarT]] = np.linspace(0, 0.99, 100), fwer_method: Literal['bonferroni', 'fixed_sequence', 'bonferroni_holm', 'split_fixed_sequence'] | FWERProcedure = 'bonferroni_holm')[source]

Controls the risk or performance of a binary classifier.

BinaryClassificationController finds the decision thresholds of a binary classifier that statistically guarantee a risk to be below a target level (the risk is “controlled”). It can be used to control a performance metric as well, such as the precision. In that case, the thresholds guarantee that the performance is above a target level.

Usage:

Instantiate a BinaryClassificationController, providing the predict_proba method of your binary classifier
Call the calibrate method to find the thresholds
Use the predict method to predict using the best threshold

Note: for a given model, calibration dataset, target level, and confidence level, there may not be any threshold controlling the risk.

Parameters:

predict_functionCallable[[ArrayLike], NDArray]

predict_proba method of a fitted binary classifier. Its output signature must be of shape (len(X), 2).

Or, in the general case of multi-dimensional parameters (thresholds), a function that takes (X, *params) and outputs 0 or 1. This can be useful to e.g., ensemble multiple binary classifiers with different thresholds for each classifier. In that case, predict_params must be provided.

riskUnion[BinaryRisk, str, List[BinaryRisk, str]]

The risk or performance metric to control. Valid options:

An existing risk defined in mapie.risk_control accessible through its string equivalent: “precision”, “recall”, “accuracy”, “fpr” for false positive rate, or “predicted_positive_fraction”.
A custom instance of BinaryRisk object

Can be a list of risks in the case of multi risk control.

target_levelUnion[float, List[float]]

The maximum risk level (or minimum performance level). Must be between 0 and 1. Can be a list of target levels in the case of multi risk control (length should match the length of the risks list).

confidence_levelfloat, default=0.9

The confidence level with which the risk (or performance) is controlled. Must be between 0 and 1. See the documentation for detailed explanations.

best_predict_param_choiceUnion[“auto”, BinaryRisk, str],

default=”auto” How to select the best threshold from the valid thresholds that control the risk (or performance). The BinaryClassificationController will try to minimize (or maximize) a secondary objective. Valid options:

“auto” (default). For mono risk defined in mapie.risk_control, an automatic choice is made. For multi risk, we use the first risk in the list.
An existing risk defined in mapie.risk_control accessible through

its string equivalent: “precision”, “recall”, “accuracy”, “fpr” for false positive rate, or “predicted_positive_fraction”.
A custom instance of BinaryRisk object

list_predict_paramsNDArray, default=np.linspace(0, 0.99, 100)

The set of parameters (noted λ in [1]) to consider for controlling the risk (or performance). When predict_function is a predict_proba method, the shape is (n_params,) and the parameter values are used to threshold the probabilities. When predict_function is a general function with multi-dimensional parameters (λ) that outputs 0 or 1, the shape is (n_params, params_dim). Note that performance is degraded when len(predict_params) is large as it is used by the Bonferroni correction [1].

fwer_method{“bonferroni”, “bonferroni_holm”, “fixed_sequence”, “split_fixed_sequence”} or FWERProcedure instance, default=”bonferroni_holm”

Method used to control the family-wise error rate (FWER).

Supported methods: - “bonferroni” : Classical Bonferroni correction. It is valid in all settings but can be conservative, especially when the number of tested parameters is large. - “fixed_sequence” : Fixed Sequence Testing (FST) with a single start. However, users can use multi-start by instantiating FWERFixedSequenceTesting with any desired number of starts and passing the instance to control_fwer. - “bonferroni_holm” : Sequential Graphical Testing corresponding to the Bonferroni–Holm procedure. This is the default method and is suitable for general settings. - “split_fixed_sequence” : Split Fixed Sequence Testing (SFST).

Attributes:

valid_predict_paramsNDArray: The valid thresholds that control the risk (or performance). Use the calibrate method to compute these.
best_predict_paramOptional[Union[float, Tuple[float, …]]]: The best threshold that control the risk (or performance). It is a tuple if multi-dimensional parameters are used. Use the calibrate method to compute it.
p_valuesNDArray: P-values associated with each tested parameter in list_predict_params. In the multi-risk setting, the value corresponds to the maximum over the tested risks.

References

[1] Angelopoulos, Anastasios N., Stephen, Bates, Emmanuel J. Candès, et al. “Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control.” (2022)

Examples

>>> import numpy as np
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from mapie.risk_control import BinaryClassificationController, precision

>>> X, y = make_classification(
...     n_features=2,
...     n_redundant=0,
...     n_informative=2,
...     n_clusters_per_class=1,
...     n_classes=2,
...     random_state=42,
...     class_sep=2.0
... )
>>> X_train, X_temp, y_train, y_temp = train_test_split(
...     X, y, test_size=0.4, random_state=42
... )
>>> X_calib, X_test, y_calib, y_test = train_test_split(
...     X_temp, y_temp, test_size=0.1, random_state=42
... )

>>> clf = LogisticRegression().fit(X_train, y_train)

>>> controller = BinaryClassificationController(
...     predict_function=clf.predict_proba,
...     risk=precision,
...     target_level=0.6
... )

>>> predictions = controller.calibrate(X_calib, y_calib).predict(X_test)

__init__(predict_function: Callable[[ArrayLike], ndarray[tuple[Any, ...], dtype[_ScalarT]]], risk: BinaryRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | List[BinaryRisk] | List[Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']] | List[BinaryRisk | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate']], target_level: float | List[float], confidence_level: float = 0.9, best_predict_param_choice: Literal['auto'] | Literal['precision', 'recall', 'accuracy', 'fpr', 'predicted_positive_fraction', 'positive_predictive_value', 'negative_predictive_value', 'abstention_rate'] | BinaryRisk = 'auto', list_predict_params: ndarray[tuple[Any, ...], dtype[_ScalarT]] = np.linspace(0, 0.99, 100), fwer_method: Literal['bonferroni', 'fixed_sequence', 'bonferroni_holm', 'split_fixed_sequence'] | FWERProcedure = 'bonferroni_holm')[source]

calibrate(X_calibrate: ArrayLike, y_calibrate: ArrayLike) → BinaryClassificationController[source]

Calibrate the BinaryClassificationController. Sets attributes valid_predict_params and best_predict_param (if the risk or performance can be controlled at the target level).

Parameters:

X_calibrateArrayLike: Features of the calibration set.
y_calibrateArrayLike: Binary labels of the calibration set.

Returns:

BinaryClassificationController: The calibrated controller instance.

Notes

When using fwer_method=”split_fixed_sequence”, the learning step must be performed separately on independent data:

bcc.learn_fixed_sequence_order(X_learn, y_learn)
bcc.calibrate(X_calibrate, y_calibrate)

Using the same data for both steps would invalidate guarantees.

learn_fixed_sequence_order(X_learn: ArrayLike, y_learn: ArrayLike, beta_grid: ndarray[tuple[Any, ...], dtype[_ScalarT]] = np.logspace(-25, 0, 1000), binary: bool = False) → BinaryClassificationController[source]

Learn an ordered sequence of prediction parameters for split fixed-sequence FWER control.

This method performs the learning step of split fixed-sequence testing. It must be called before calibrate when fwer_method=”split_fixed_sequence”.

The data provided here must be independent from the calibration data used later in calibrate. Using the same data would invalidate the statistical guarantees.

A typical workflow is to split your calibration dataset:

one subset for learning the parameter order
one subset for calibration

For each value in beta_grid, the parameter whose p-value vector is closest to the constant vector beta is selected. Duplicate parameters are removed while preserving order, yielding a deterministic testing sequence.

Parameters:

X_learnArrayLike: Features used only to learn the parameter order.
y_learnArrayLike: Binary labels associated with X_learn.
beta_gridNDArray, default=np.logspace(-25, 0, 1000): Grid of target p-values used to construct the ordering. Smaller values prioritize parameters with stronger evidence.
binarybool, default=False: Whether the loss associated with the controlled risk is binary.

Returns:

BinaryClassificationController: The controller instance with the learned sequence of ordered prediction parameters.

Notes

This method does NOT perform risk control. It only determines an order of parameters. Statistical guarantees are provided later when calling calibrate.

predict(X_test: ArrayLike) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]