mapie.multi_label_classification.MapieMultiLabelClassifier

class mapie.multi_label_classification.MapieMultiLabelClassifier(estimator: Optional[sklearn.base.ClassifierMixin] = None, metric_control: Optional[str] = 'recall', method: Optional[str] = None, n_jobs: Optional[int] = None, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None, verbose: int = 0)[source]

Prediction sets for multilabel-classification.

This class implements two conformal prediction methods for estimating prediction sets for multilabel-classification. It guarantees (under the hypothesis of exchangeability) that a risk is at least 1 - alpha (alpha is a user-specified parameter). For now, we consider the recall as risk.

Parameters
estimatorOptional[ClassifierMixin]

Any fitted multi-label classifier with scikit-learn API (i.e. with fit, predict, and predict_proba methods). If None, estimator by default is a sklearn LogisticRegression instance.

by default None

n_jobs: Optional[int]

Number of jobs for parallel processing using joblib via the “locky” backend. For this moment, parallel processing is disabled. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. “None” is a marker for unset that will be interpreted as n_jobs=1 (sequential execution).

By default None.

random_state: Optional[Union[int, RandomState]]

Pseudo random number generator state used for random uniform sampling to evaluate quantiles and prediction sets. Pass an int for reproducible output across multiple function calls.

By default 1.

verboseint, optional

The verbosity level, used with joblib for parallel processing. For the moment, parallel processing is disabled. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported. Above 50, the output is sent to stdout.

By default 0.

References

[1] Lihua Lei Jitendra Malik Stephen Bates, Anastasios Angelopoulos and Michael I. Jordan. Distribution-free, risk-controlling prediction sets. CoRR, abs/2101.02703, 2021. URL https://arxiv.org/abs/2101.02703.39

[2] Angelopoulos, Anastasios N., Stephen, Bates, Adam, Fisch, Lihua, Lei, and Tal, Schuster. “Conformal Risk Control.” (2022).

[3] Angelopoulos, A. N., Bates, S., Candès, E. J., Jordan, M. I., & Lei, L. (2021). Learn then test: “Calibrating predictive algorithms to achieve risk control”.

Examples

>>> import numpy as np
>>> from sklearn.multioutput import MultiOutputClassifier
>>> from sklearn.linear_model import LogisticRegression
>>> from mapie.multi_label_classification import MapieMultiLabelClassifier
>>> X_toy = np.arange(4).reshape(-1, 1)
>>> y_toy = np.stack([[1, 0, 1], [1, 0, 0], [0, 1, 1], [0, 1, 0]])
>>> clf = MultiOutputClassifier(LogisticRegression()).fit(X_toy, y_toy)
>>> mapie = MapieMultiLabelClassifier(estimator=clf).fit(X_toy, y_toy)
>>> _, y_pi_mapie = mapie.predict(X_toy, alpha=0.3)
>>> print(y_pi_mapie[:, :, 0])
[[ True False  True]
 [ True False  True]
 [False  True False]
 [False  True False]]
Attributes
valid_methods: List[str]

List of all valid methods. Either CRC or RCPS

valid_methods: List[Union[str, ``None``]]

List of all valid bounds computation for RCPS only.

single_estimator_sklearn.ClassifierMixin

Estimator fitted on the whole training set.

n_lambdas: int

Number of thresholds on which we compute the risk.

lambdas: NDArray

Array with all the values of lambda.

risksArrayLike of shape (n_samples_cal, n_lambdas)

The risk for each observation for each threshold

r_hatArrayLike of shape (n_lambdas)

Average risk for each lambda

r_hat_plus: ArrayLike of shape (n_lambdas)

Upper confidence bound for each lambda, computed with different bounds (see predict). Only relevant when method=”rcps”.

lambdas_star: ArrayLike of shape (n_lambdas)

Optimal threshold for a given alpha.

valid_index: List[List[Any]]

List of list of all index that satisfy fwer controlling. This attribute is computed when the user wants to control precision score. Only relevant when metric_control=”precision” as it uses learn then test (ltt) procedure. Contains n_alpha lists (see predict).

sigma_initOptional[float]

First variance in the sigma_hat array. The default value is the same as in the paper implementation [1].

__init__(estimator: Optional[sklearn.base.ClassifierMixin] = None, metric_control: Optional[str] = 'recall', method: Optional[str] = None, n_jobs: Optional[int] = None, random_state: Optional[Union[int, numpy.random.mtrand.RandomState]] = None, verbose: int = 0) None[source]
fit(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], calib_size: Optional[float] = 0.3) mapie.multi_label_classification.MapieMultiLabelClassifier[source]

Fit the base estimator or use the fitted base estimator.

Parameters
X: ArrayLike of shape (n_samples, n_features)

Training data.

y: NDArray of shape (n_samples, n_classes)

Training labels.

calib_size: Optional[float]

Size of the calibration dataset with respect to X if the given model is None need to fit a LogisticRegression.

By default .3

Returns
MapieMultiLabelClassifier

The model itself.

partial_fit(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], _refit: Optional[bool] = False) mapie.multi_label_classification.MapieMultiLabelClassifier[source]

Fit the base estimator or use the fitted base estimator on batch data. All the computed risks will be concatenated each time the partial_fit method is called.

Parameters
XArrayLike of shape (n_samples, n_features)

Training data.

yNDArray of shape (n_samples, n_classes)

Training labels.

_refit: bool

Whether or not refit from scratch.

By default False

Returns
MapieMultiLabelClassifier

The model itself.

predict(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype[Any]]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], alpha: Optional[Union[float, Iterable[float]]] = None, delta: Optional[float] = None, bound: Optional[str] = None) Union[numpy.ndarray[Any, numpy.dtype[numpy._typing._array_like._ScalarType_co]], Tuple[numpy.ndarray[Any, numpy.dtype[numpy._typing._array_like._ScalarType_co]], numpy.ndarray[Any, numpy.dtype[numpy._typing._array_like._ScalarType_co]]]][source]

Prediction sets on new samples based on target confidence interval. Prediction sets for a given alpha are deduced from the computed risks.

Parameters
X: ArrayLike of shape (n_samples, n_features)
alphaOptional[Union[float, Iterable[float]]]

Can be a float, a list of floats, or a ArrayLike of floats. Between 0 and 1, represent the uncertainty of the confidence interval. Lower alpha produce larger (more conservative) prediction sets. alpha is the complement of the target coverage level. By default None.

deltaOptional[float]

Can be a float, or None. If using method=”rcps”, then it can not be set to None. Between 0 and 1, the level of certainty at which we compute the Upper Confidence Bound of the average risk. Lower delta produce larger (more conservative) prediction sets. By default None.

boundOptional[Union[str, None]]

Method used to compute the Upper Confidence Bound of the average risk. Only necessary with the RCPS method. By default None.

Returns
Union[NDArray, Tuple[NDArray, NDArray]]
  • NDArray of shape (n_samples,) if alpha is None.
  • Tuple[NDArray, NDArray] of shapes
(n_samples, n_classes) and (n_samples, n_classes, n_alpha)
if alpha is not None.
set_fit_request(*, calib_size: Union[bool, None, str] = '$UNCHANGED$') mapie.multi_label_classification.MapieMultiLabelClassifier

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
calib_sizestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for calib_size parameter in fit.

Returns
selfobject

The updated object.

set_partial_fit_request(*, _refit: Union[bool, None, str] = '$UNCHANGED$') mapie.multi_label_classification.MapieMultiLabelClassifier

Request metadata passed to the partial_fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to partial_fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to partial_fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
_refitstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for _refit parameter in partial_fit.

Returns
selfobject

The updated object.

set_predict_request(*, alpha: Union[bool, None, str] = '$UNCHANGED$', bound: Union[bool, None, str] = '$UNCHANGED$', delta: Union[bool, None, str] = '$UNCHANGED$') mapie.multi_label_classification.MapieMultiLabelClassifier

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to predict.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
alphastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for alpha parameter in predict.

boundstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for bound parameter in predict.

deltastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for delta parameter in predict.

Returns
selfobject

The updated object.

set_score_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') mapie.multi_label_classification.MapieMultiLabelClassifier

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns
selfobject

The updated object.

Examples using mapie.multi_label_classification.MapieMultiLabelClassifier