mapie.risk_control.PrecisionRecallController
- class mapie.risk_control.PrecisionRecallController(estimator: ClassifierMixin | None = None, metric_control: str | None = 'recall', method: str | None = None, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0)[source]
Prediction sets for multilabel-classification.
This class implements two conformal prediction methods for estimating prediction sets for multilabel-classification. It guarantees (under the hypothesis of exchangeability) that a risk is at least 1 - alpha (alpha is a user-specified parameter). For now, we consider the recall as risk.
- Parameters:
- estimatorOptional[ClassifierMixin]
Any fitted multi-label classifier with scikit-learn API (i.e. with fit, predict, and predict_proba methods). If
None, estimator by default is a sklearn LogisticRegression instance.by default
None- metric_controlOptional[str]
Metric to control. Either “recall” or “precision”. By default
recall.- methodOptional[str]
Method to use for the prediction sets. If metric_control is “recall”, then the method can be either “crc” (default) or “rcps”. If metric_control is “precision”, then the method used to control the precision is “ltt”.
- n_jobs: Optional[int]
Number of jobs for parallel processing using joblib via the “locky” backend. For this moment, parallel processing is disabled. If
-1all CPUs are used. If1is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below-1,(n_cpus + 1 + n_jobs)are used. “None” is a marker for unset that will be interpreted asn_jobs=1(sequential execution).By default
None.- random_state: Optional[Union[int, RandomState]]
Pseudo random number generator state used for random uniform sampling to evaluate quantiles and prediction sets. Pass an int for reproducible output across multiple function calls.
By default
1.- verboseint, optional
The verbosity level, used with joblib for parallel processing. For the moment, parallel processing is disabled. The frequency of the messages increases with the verbosity level. If it more than
10, all iterations are reported. Above50, the output is sent to stdout.By default
0.
- Attributes:
- valid_methods: List[str]
List of all valid methods. Either CRC or RCPS
- valid_bounds: List[Union[str, ``None``]]
List of all valid bounds computation for RCPS only.
- single_estimator_sklearn.ClassifierMixin
Estimator fitted on the whole training set.
- n_lambdas: int
Number of thresholds on which we compute the risk.
- lambdas: NDArray
Array with all the values of lambda.
- risksArrayLike of shape (n_samples_cal, n_lambdas)
The risk for each observation for each threshold
- r_hatArrayLike of shape (n_lambdas)
Average risk for each lambda
- r_hat_plus: ArrayLike of shape (n_lambdas)
Upper confidence bound for each lambda, computed with different bounds (see predict). Only relevant when method=”rcps”.
- lambdas_star: ArrayLike of shape (n_lambdas)
Optimal threshold for a given alpha.
- valid_index: List[List[Any]]
List of list of all index that satisfy fwer controlling. This attribute is computed when the user wants to control precision score. Only relevant when metric_control=”precision” as it uses learn then test (ltt) procedure. Contains n_alpha lists (see predict).
- sigma_initOptional[float]
First variance in the sigma_hat array. The default value is the same as in the paper implementation [1].
References
[1] Lihua Lei Jitendra Malik Stephen Bates, Anastasios Angelopoulos, and Michael I. Jordan. Distribution-free, risk-controlling prediction sets. CoRR, abs/2101.02703, 2021. URL https://arxiv.org/abs/2101.02703
[2] Angelopoulos, Anastasios N., Stephen, Bates, Adam, Fisch, Lihua, Lei, and Tal, Schuster. “Conformal Risk Control.” (2022).
[3] Angelopoulos, A. N., Bates, S., Candès, E. J., Jordan, M. I., & Lei, L. (2021). Learn then test: “Calibrating predictive algorithms to achieve risk control”.
Examples
>>> import numpy as np >>> from sklearn.multioutput import MultiOutputClassifier >>> from sklearn.linear_model import LogisticRegression >>> from mapie.risk_control import PrecisionRecallController >>> X_toy = np.arange(4).reshape(-1, 1) >>> y_toy = np.stack([[1, 0, 1], [1, 0, 0], [0, 1, 1], [0, 1, 0]]) >>> clf = MultiOutputClassifier(LogisticRegression()).fit(X_toy, y_toy) >>> mapie = PrecisionRecallController(estimator=clf).fit(X_toy, y_toy) >>> _, y_pi_mapie = mapie.predict(X_toy, alpha=0.3) >>> print(y_pi_mapie[:, :, 0]) [[ True False True] [ True False True] [False True False] [False True False]]
- __init__(estimator: ClassifierMixin | None = None, metric_control: str | None = 'recall', method: str | None = None, n_jobs: int | None = None, random_state: int | RandomState | None = None, verbose: int = 0) None[source]
- fit(X: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], calib_size: float | None = 0.3) PrecisionRecallController[source]
Fit the base estimator (or use the fitted base estimator) and compute risks.
- Parameters:
- X: ArrayLike of shape (n_samples, n_features)
Training data.
- y: NDArray of shape (n_samples, n_classes)
Training labels.
- calib_size: Optional[float]
Size of the calibration dataset with respect to X if the given model is
Noneneed to fit a LogisticRegression.By default .3
- Returns:
- PrecisionRecallController
The model itself.
- partial_fit(X: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], y: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], _refit: bool | None = False) PrecisionRecallController[source]
Fit the base estimator or use the fitted base estimator on batch data to compute risks. All the computed risks will be concatenated each time the partial_fit method is called.
- Parameters:
- XArrayLike of shape (n_samples, n_features)
Training data.
- yNDArray of shape (n_samples, n_classes)
Training labels.
- _refit: bool
Whether or not refit from scratch.
By default False
- Returns:
- PrecisionRecallController
The model itself.
- predict(X: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | complex | bytes | str | _NestedSequence[complex | bytes | str], alpha: float | Iterable[float] | None = None, delta: float | None = None, bound: str | None = None) ndarray[tuple[Any, ...], dtype[_ScalarT]] | Tuple[ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]
Prediction sets on new samples based on the target risk level. Prediction sets for a given
alphaare deduced from the computed risks.- Parameters:
- X: ArrayLike of shape (n_samples, n_features)
- alphaOptional[Union[float, Iterable[float]]]
The target risk level. Can be a float, a list of floats, or a
ArrayLikeof floats, between 0 and 1. Loweralphaproduce larger (more conservative) prediction sets. By defaultNone(which means alpha=0.1).- deltaOptional[float]
Can be a float, or
None. If using method=”rcps”, then it can not be set toNone. Between 0 and 1, the level of certainty at which we compute the Upper Confidence Bound of the average risk. Lowerdeltaproduce larger (more conservative) prediction sets. By defaultNone.- boundOptional[Union[str,
None]] Method used to compute the Upper Confidence Bound of the average risk. Only necessary with the RCPS method. By default
None.
- Returns:
- Union[NDArray, Tuple[NDArray, NDArray]]
- NDArray of shape (n_samples,) if alpha is
None.
- NDArray of shape (n_samples,) if alpha is
- Tuple[NDArray, NDArray] of shapes
- (n_samples, n_classes) and (n_samples, n_classes, n_alpha)
- if alpha is not
None.
- set_fit_request(*, calib_size: bool | None | str = '$UNCHANGED$') PrecisionRecallController
Configure whether metadata should be requested to be passed to the
fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed tofitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it tofit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- calib_sizestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
calib_sizeparameter infit.
- Returns:
- selfobject
The updated object.
- set_partial_fit_request(*, _refit: bool | None | str = '$UNCHANGED$') PrecisionRecallController
Configure whether metadata should be requested to be passed to the
partial_fitmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topartial_fitif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topartial_fit.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- _refitstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
_refitparameter inpartial_fit.
- Returns:
- selfobject
The updated object.
- set_predict_request(*, alpha: bool | None | str = '$UNCHANGED$', bound: bool | None | str = '$UNCHANGED$', delta: bool | None | str = '$UNCHANGED$') PrecisionRecallController
Configure whether metadata should be requested to be passed to the
predictmethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed topredictif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it topredict.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- alphastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
alphaparameter inpredict.- boundstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
boundparameter inpredict.- deltastr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
deltaparameter inpredict.
- Returns:
- selfobject
The updated object.
- set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') PrecisionRecallController
Configure whether metadata should be requested to be passed to the
scoremethod.Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with
enable_metadata_routing=True(seesklearn.set_config()). Please check the User Guide on how the routing mechanism works.The options for each parameter are:
True: metadata is requested, and passed toscoreif provided. The request is ignored if metadata is not provided.False: metadata is not requested and the meta-estimator will not pass it toscore.None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.Added in version 1.3.
- Parameters:
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weightparameter inscore.
- Returns:
- selfobject
The updated object.
Examples using mapie.risk_control.PrecisionRecallController
Tutorial for recall and precision control for multi-label classification