`mapie.classification`.CrossConformalClassifier

class mapie.classification.CrossConformalClassifier(estimator: ClassifierMixin = LogisticRegression(), confidence_level: float | Iterable[float] = 0.9, conformity_score: str | BaseClassificationScore = 'lac', cv: int | BaseCrossValidator = 5, n_jobs: int | None = None, verbose: int = 0, random_state: int | RandomState | None = None)[source]

Computes prediction sets using the cross conformal classification technique:

The fit_conformalize method estimates the uncertainty of the base classifier in a cross-validation style. It fits the base classifier on folds of the dataset and computes conformity scores on the out-of-fold data.
The predict_set method predicts labels and sets of labels.

Parameters:

estimatorClassifierMixin, default=LogisticRegression()

The base classifier used to predict labels.

confidence_levelUnion[float, List[float]], default=0.9

The confidence level(s) for the prediction sets, indicating the desired coverage probability of the prediction sets. If a float is provided, it represents a single confidence level. If a list, multiple prediction sets for each specified confidence level are returned.

conformity_scoreUnion[str, BaseClassificationScore], default=”lac”

The method used to compute conformity scores. Valid options:

“lac”
“aps”
Any subclass of BaseClassificationScore

A custom score function inheriting from BaseClassificationScore may also be provided.

See [theoretical description (classification)](../theory/classification.md).

cvUnion[int, BaseCrossValidator], default=5

The cross-validator used to compute conformity scores. Valid options:

integer, to specify the number of folds
any sklearn.model_selection.BaseCrossValidator suitable for classification, or a custom cross-validator inheriting from it.

Main variants in the cross conformal setting are:

sklearn.model_selection.KFold (vanilla cross conformal)
sklearn.model_selection.LeaveOneOut (jackknife)

n_jobsOptional[int], default=None

The number of jobs to run in parallel when applicable.

verboseint, default=0

Controls the verbosity level. Higher values increase the output details.

random_stateOptional[Union[int, np.random.RandomState]], default=None

A seed or random state instance to ensure reproducibility in any random operations within the classifier.

Examples

>>> from mapie.classification import CrossConformalClassifier
>>> from sklearn.datasets import make_classification
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.neighbors import KNeighborsClassifier

>>> X_full, y_full = make_classification(n_samples=500)
>>> X, X_test, y, y_test = train_test_split(X_full, y_full)

>>> mapie_classifier = CrossConformalClassifier(
...     estimator=KNeighborsClassifier(),
...     confidence_level=0.95,
...     cv=10
... ).fit_conformalize(X, y)

>>> predicted_labels, predicted_sets = mapie_classifier.predict_set(X_test)

__init__(estimator: ClassifierMixin = LogisticRegression(), confidence_level: float | Iterable[float] = 0.9, conformity_score: str | BaseClassificationScore = 'lac', cv: int | BaseCrossValidator = 5, n_jobs: int | None = None, verbose: int = 0, random_state: int | RandomState | None = None) → None[source]

property conformity_scores: ndarray[tuple[Any, ...], dtype[_ScalarT]]

Returns the conformity scores computed by the fit_conformalize method, on the out-of-fold predictions produced during cross-validation.

Returns:

NDArray: Array of conformity scores, with shape (n_samples,).

fit_conformalize(X: ArrayLike, y: ArrayLike, groups: ArrayLike | None = None, fit_params: dict | None = None, predict_params: dict | None = None) → CrossConformalClassifier[source]

Estimates the uncertainty of the base classifier in a cross-validation style: fits the base classifier on different folds of the dataset and computes conformity scores on the corresponding out-of-fold data.

If called on an instance that has already been fitted, a UserWarning is emitted and the previously computed conformity scores are discarded before the new fit. Call reset() explicitly to suppress the warning.

Parameters:

XArrayLike: Features
yArrayLike: Targets
groups: Optional[ArrayLike] of shape (n_samples,), default=None: Groups to pass to the cross-validator.
fit_paramsOptional[dict], default=None: Parameters to pass to the fit method of the base classifier.
predict_paramsOptional[dict], default=None: Parameters to pass to the predict and predict_proba methods of the base classifier. These parameters will also be used in the predict_set and predict methods of this CrossConformalClassifier.

Returns:

Self: This CrossConformalClassifier instance, fitted and conformalized.

predict(X: ArrayLike) → ndarray[tuple[Any, ...], dtype[_ScalarT]][source]

For each sample in X, returns the predicted label by the base classifier.

Parameters:

XArrayLike: Features

Returns:

NDArray: Array of predicted labels, with shape (n_samples,).

predict_set(X: ArrayLike, conformity_score_params: dict | None = None, agg_scores: str = 'mean') → Tuple[ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]

For each sample in X, predicts a label (using the base classifier), and a set of labels.

If several confidence levels were provided during initialisation, several sets will be predicted for each sample. See the return signature.

Parameters:

XArrayLike

Features

conformity_score_paramsOptional[dict], default=None

Parameters specific to conformity scores, used at prediction time.

The only example for now is include_last_label, available for aps and raps conformity scores. For detailed information on include_last_label, see the docstring of APSConformityScore.get_prediction_sets.

agg_scoresstr, default=”mean”

How to aggregate conformity scores.

Each classifier fitted on different folds of the dataset is used to produce conformity scores on the test data. The agg_score parameter allows to control how those scores are aggregated. Valid options:

“mean”, takes the mean of scores.
“crossval”, compares the scores between all training data and each test point for each label to estimate if the label must be included in the prediction set. Follows algorithm 2 of Classification with Valid and Adaptive Coverage (Romano+2020).

Returns: