mapie.risk_control.BinaryClassificationRisk

class mapie.risk_control.BinaryClassificationRisk(risk_occurrence: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], risk_condition: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], higher_is_better: bool)[source]

Define a risk (or a performance metric) to be used with the BinaryClassificationController. Predefined instances are implemented, see mapie.risk_control.precision, mapie.risk_control.recall, mapie.risk_control.accuracy, mapie.risk_control.false_positive_rate, and mapie.risk_control.predicted_positive_fraction.

Here, a binary classification risk (or performance) is defined by an occurrence and a condition. Let’s take the example of precision. Precision is the sum of true positives over the total number of predicted positives. In other words, precision is the average of correct predictions (occurrence) given that those predictions are positive (condition). Programmatically, precision = (sum(y_pred == y_true) if y_pred == 1)/sum(y_pred == 1). Because precision is a performance metric rather than a risk, higher_is_better must be set to True. See the implementation of precision in mapie.risk_control.

Note: any risk or performance metric that can be defined as sum(occurrence if condition) / sum(condition) can be theoretically controlled with the BinaryClassificationController, thanks to the LearnThenTest framework [1] and the binary Hoeffding-Bentkus p-values implemented in MAPIE.

Note: by definition, the value of the risk (or performance metric) here is always between 0 and 1.

Parameters:
risk_occurrenceCallable[[int, int], bool]

A function defining the occurrence of the risk for a given sample. Must take y_true and y_pred as input and return a boolean.

risk_conditionCallable[[int, int], bool]

A function defining the condition of the risk for a given sample, Must take y_true and y_pred as input and return a boolean.

higher_is_betterbool

Whether this BinaryClassificationRisk instance is a risk (higher_is_better=False) or a performance metric (higher_is_better=True).

Attributes:
higher_is_betterbool

See params.

References

[1] Angelopoulos, Anastasios N., Stephen, Bates, Emmanuel J. Candès, et al. “Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control.” (2022)

__init__(risk_occurrence: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], risk_condition: Callable[[ndarray[tuple[Any, ...], dtype[integer]], ndarray[tuple[Any, ...], dtype[integer]]], ndarray[tuple[Any, ...], dtype[bool]]], higher_is_better: bool)[source]
get_value_and_effective_sample_size(y_true: ndarray[tuple[Any, ...], dtype[_ScalarT]], y_pred: ndarray[tuple[Any, ...], dtype[_ScalarT]]) Tuple[float, int][source]

Computes the value of a risk given an array of ground truth labels and the corresponding predictions. Also returns the number of samples used to compute that value.

That number can be different from the total number of samples. For example, in the case of precision, only the samples with positive predictions are used.

In the case of a performance metric, this function returns 1 - perf_value.

Parameters:
y_trueNDArray

NDArray of ground truth labels, of shape (n_samples,), with values in {0, 1}

y_predNDArray

NDArray of predictions, of shape (n_samples,), with values in {0, 1}

Returns:
Tuple[float, int]

A tuple containing the value of the risk between 0 and 1, and the number of effective samples used to compute that value (between 1 and n_samples).

In the case of a performance metric, this function returns 1 - perf_value.

If the risk is not defined (condition never met), the value is set to 1, and the number of effective samples is set to -1.

Examples using mapie.risk_control.BinaryClassificationRisk

Precision control for a binary classifier

Precision control for a binary classifier

Control multiple risks of a binary classifier

Control multiple risks of a binary classifier

Risk Control for LLM as a Judge with Abstention

Risk Control for LLM as a Judge with Abstention