`mapie.risk_control`.BinaryClassificationRisk

class mapie.risk_control.BinaryClassificationRisk(risk_occurrence: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], risk_condition: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], higher_is_better: bool)[source]

Define a risk (or a performance metric) to be used with the BinaryClassificationController. Predefined instances are implemented, see mapie.risk_control.precision, mapie.risk_control.recall, mapie.risk_control.accuracy, mapie.risk_control.false_positive_rate, and mapie.risk_control.predicted_positive_fraction.

Here, a binary classification risk (or performance) is defined by an occurrence and a condition. Let’s take the example of precision. Precision is the sum of true positives over the total number of predicted positives. In other words, precision is the average of correct predictions (occurrence) given that those predictions are positive (condition). Programmatically, precision = (sum(y_pred == y_true) if y_pred == 1)/sum(y_pred == 1). Because precision is a performance metric rather than a risk, higher_is_better must be set to True. See the implementation of precision in mapie.risk_control.

Note: any risk or performance metric that can be defined as sum(occurrence if condition) / sum(condition) can be theoretically controlled with the BinaryClassificationController, thanks to the LearnThenTest framework [1] and the binary Hoeffding-Bentkus p-values implemented in MAPIE.

Note: by definition, the value of the risk (or performance metric) here is always between 0 and 1.

Parameters:

risk_occurrenceCallable[[int, int], bool]: A function defining the occurrence of the risk for a given sample. Must take y_true and y_pred as input and return a boolean.
risk_conditionCallable[[int, int], bool]: A function defining the condition of the risk for a given sample, Must take y_true and y_pred as input and return a boolean.
higher_is_betterbool: Whether this BinaryClassificationRisk instance is a risk (higher_is_better=False) or a performance metric (higher_is_better=True).

Attributes:

higher_is_betterbool: See params.

References

[1] Angelopoulos, Anastasios N., Stephen, Bates, Emmanuel J. Candès, et al. “Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control.” (2022)

__init__(risk_occurrence: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], risk_condition: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], higher_is_better: bool)[source]

get_value_and_effective_sample_size(y_true: ndarray[tuple[Any, ...], dtype[_ScalarT]], y_pred: ndarray[tuple[Any, ...], dtype[_ScalarT]]) → Tuple[float, int][source]

Computes the value of a risk given an array of ground truth labels and the corresponding predictions. Also returns the number of samples used to compute that value.

That number can be different from the total number of samples. For example, in the case of precision, only the samples with positive predictions are used.

In the case of a performance metric, this function returns 1 - perf_value.

Parameters:

y_trueNDArray: NDArray of ground truth labels, of shape (n_samples,), with values in {0, 1}
y_predNDArray: NDArray of predictions, of shape (n_samples,), with values in {0, 1}

Returns: