mapie.classification.CrossConformalClassifier

class mapie.classification.CrossConformalClassifier(estimator: ClassifierMixin = LogisticRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseClassificationScore] = 'lac', cv: Union[int, BaseCrossValidator] = 5, n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None)[source]

Computes prediction sets using the cross conformal classification technique:

  1. The fit_conformalize method estimates the uncertainty of the base classifier in a cross-validation style. It fits the base classifier on folds of the dataset and computes conformity scores on the out-of-fold data.

  2. The predict_set method predicts labels and sets of labels.

Parameters
estimatorClassifierMixin, default=LogisticRegression()

The base classifier used to predict labels.

confidence_levelUnion[float, List[float]], default=0.9

The confidence level(s) for the prediction sets, indicating the desired coverage probability of the prediction sets. If a float is provided, it represents a single confidence level. If a list, multiple prediction sets for each specified confidence level are returned.

conformity_scoreUnion[str, BaseClassificationScore], default=”lac”

The method used to compute conformity scores. Valid options:

  • “lac”

  • “aps”

  • Any subclass of BaseClassificationScore

A custom score function inheriting from BaseClassificationScore may also be provided.

See Theoretical Description.

cvUnion[int, BaseCrossValidator], default=5

The cross-validator used to compute conformity scores. Valid options:

  • integer, to specify the number of folds

  • any sklearn.model_selection.BaseCrossValidator suitable for classification, or a custom cross-validator inheriting from it.

Main variants in the cross conformal setting are:

  • sklearn.model_selection.KFold (vanilla cross conformal)

  • sklearn.model_selection.LeaveOneOut (jackknife)

n_jobsOptional[int], default=None

The number of jobs to run in parallel when applicable.

verboseint, default=0

Controls the verbosity level. Higher values increase the output details.

random_stateOptional[Union[int, np.random.RandomState]], default=None

A seed or random state instance to ensure reproducibility in any random operations within the classifier.

Examples

>>> from mapie.classification import CrossConformalClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.neighbors import KNeighborsClassifier
>>> X_full, y_full = make_classification(n_samples=500)
>>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_classifier = CrossConformalClassifier(
...     estimator=KNeighborsClassifier(),
...     confidence_level=0.95,
...     cv=10
... ).fit_conformalize(X, y)
>>> predicted_labels, predicted_sets = mapie_classifier.predict_set(X_test)
__init__(estimator: ClassifierMixin = LogisticRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseClassificationScore] = 'lac', cv: Union[int, BaseCrossValidator] = 5, n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None) None[source]
fit_conformalize(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], groups: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, fit_params: Optional[dict] = None, predict_params: Optional[dict] = None) CrossConformalClassifier[source]

Estimates the uncertainty of the base classifier in a cross-validation style: fits the base classifier on different folds of the dataset and computes conformity scores on the corresponding out-of-fold data.

Parameters
XArrayLike

Features

yArrayLike

Targets

groups: Optional[ArrayLike] of shape (n_samples,), default=None

Groups to pass to the cross-validator.

fit_paramsOptional[dict], default=None

Parameters to pass to the fit method of the base classifier.

predict_paramsOptional[dict], default=None

Parameters to pass to the predict and predict_proba methods of the base classifier. These parameters will also be used in the predict_set and predict methods of this CrossConformalClassifier.

Returns
Self

This CrossConformalClassifier instance, fitted and conformalized.

predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray[Any, dtype[_ScalarType_co]][source]

For each sample in X, returns the predicted label by the base classifier.

Parameters
XArrayLike

Features

Returns
NDArray

Array of predicted labels, with shape (n_samples,).

predict_set(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], conformity_score_params: Optional[dict] = None, agg_scores: str = 'mean') Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]][source]

For each sample in X, predicts a label (using the base classifier), and a set of labels.

If several confidence levels were provided during initialisation, several sets will be predicted for each sample. See the return signature.

Parameters
XArrayLike

Features

conformity_score_paramsOptional[dict], default=None

Parameters specific to conformity scores, used at prediction time.

The only example for now is include_last_label, available for aps and raps conformity scores. For detailed information on include_last_label, see the docstring of conformity_scores.sets.aps.APSConformityScore.get_prediction_sets().

agg_scoresstr, default=”mean”

How to aggregate conformity scores.

Each classifier fitted on different folds of the dataset is used to produce conformity scores on the test data. The agg_score parameter allows to control how those scores are aggregated. Valid options:

  • “mean”, takes the mean of scores.

  • “crossval”, compares the scores between all training data and each test point for each label to estimate if the label must be included in the prediction set. Follows algorithm 2 of Classification with Valid and Adaptive Coverage (Romano+2020).

Returns
Tuple[NDArray, NDArray]

Two arrays:

  • Prediction labels, of shape (n_samples,)

  • Prediction sets, of shape (n_samples, n_class, n_confidence_levels)

Examples using mapie.classification.CrossConformalClassifier