mapie.classification
.CrossConformalClassifier¶
- class mapie.classification.CrossConformalClassifier(estimator: ClassifierMixin = LogisticRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseClassificationScore] = 'lac', cv: Union[int, BaseCrossValidator] = 5, n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None)[source]¶
Computes prediction sets using the cross conformal classification technique:
The
fit_conformalize
method estimates the uncertainty of the base classifier in a cross-validation style. It fits the base classifier on folds of the dataset and computes conformity scores on the out-of-fold data.The
predict_set
method predicts labels and sets of labels.
- Parameters
- estimatorClassifierMixin, default=LogisticRegression()
The base classifier used to predict labels.
- confidence_levelUnion[float, List[float]], default=0.9
The confidence level(s) for the prediction sets, indicating the desired coverage probability of the prediction sets. If a float is provided, it represents a single confidence level. If a list, multiple prediction sets for each specified confidence level are returned.
- conformity_scoreUnion[str, BaseClassificationScore], default=”lac”
The method used to compute conformity scores. Valid options:
“lac”
“aps”
Any subclass of BaseClassificationScore
A custom score function inheriting from BaseClassificationScore may also be provided.
- cvUnion[int, BaseCrossValidator], default=5
The cross-validator used to compute conformity scores. Valid options:
integer, to specify the number of folds
any
sklearn.model_selection.BaseCrossValidator
suitable for classification, or a custom cross-validator inheriting from it.
Main variants in the cross conformal setting are:
sklearn.model_selection.KFold
(vanilla cross conformal)sklearn.model_selection.LeaveOneOut
(jackknife)
- n_jobsOptional[int], default=None
The number of jobs to run in parallel when applicable.
- verboseint, default=0
Controls the verbosity level. Higher values increase the output details.
- random_stateOptional[Union[int, np.random.RandomState]], default=None
A seed or random state instance to ensure reproducibility in any random operations within the classifier.
Examples
>>> from mapie.classification import CrossConformalClassifier >>> from sklearn.datasets import make_classification >>> from sklearn.model_selection import train_test_split >>> from sklearn.neighbors import KNeighborsClassifier
>>> X_full, y_full = make_classification(n_samples=500) >>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_classifier = CrossConformalClassifier( ... estimator=KNeighborsClassifier(), ... confidence_level=0.95, ... cv=10 ... ).fit_conformalize(X, y)
>>> predicted_labels, predicted_sets = mapie_classifier.predict_set(X_test)
- __init__(estimator: ClassifierMixin = LogisticRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseClassificationScore] = 'lac', cv: Union[int, BaseCrossValidator] = 5, n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None) None [source]¶
- fit_conformalize(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], groups: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, fit_params: Optional[dict] = None, predict_params: Optional[dict] = None) CrossConformalClassifier [source]¶
Estimates the uncertainty of the base classifier in a cross-validation style: fits the base classifier on different folds of the dataset and computes conformity scores on the corresponding out-of-fold data.
- Parameters
- XArrayLike
Features
- yArrayLike
Targets
- groups: Optional[ArrayLike] of shape (n_samples,), default=None
Groups to pass to the cross-validator.
- fit_paramsOptional[dict], default=None
Parameters to pass to the
fit
method of the base classifier.- predict_paramsOptional[dict], default=None
Parameters to pass to the
predict
andpredict_proba
methods of the base classifier. These parameters will also be used in thepredict_set
andpredict
methods of this CrossConformalClassifier.
- Returns
- Self
This CrossConformalClassifier instance, fitted and conformalized.
- predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray[Any, dtype[_ScalarType_co]] [source]¶
For each sample in X, returns the predicted label by the base classifier.
- Parameters
- XArrayLike
Features
- Returns
- NDArray
Array of predicted labels, with shape
(n_samples,)
.
- predict_set(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], conformity_score_params: Optional[dict] = None, agg_scores: str = 'mean') Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]] [source]¶
For each sample in X, predicts a label (using the base classifier), and a set of labels.
If several confidence levels were provided during initialisation, several sets will be predicted for each sample. See the return signature.
- Parameters
- XArrayLike
Features
- conformity_score_paramsOptional[dict], default=None
Parameters specific to conformity scores, used at prediction time.
The only example for now is
include_last_label
, available for aps and raps conformity scores. For detailed information oninclude_last_label
, see the docstring ofconformity_scores.sets.aps.APSConformityScore.get_prediction_sets()
.- agg_scoresstr, default=”mean”
How to aggregate conformity scores.
Each classifier fitted on different folds of the dataset is used to produce conformity scores on the test data. The agg_score parameter allows to control how those scores are aggregated. Valid options:
“mean”, takes the mean of scores.
“crossval”, compares the scores between all training data and each test point for each label to estimate if the label must be included in the prediction set. Follows algorithm 2 of Classification with Valid and Adaptive Coverage (Romano+2020).
- Returns
- Tuple[NDArray, NDArray]
Two arrays:
Prediction labels, of shape
(n_samples,)
Prediction sets, of shape
(n_samples, n_class, n_confidence_levels)