mapie.calibration.TopLabelCalibrator

class mapie.calibration.TopLabelCalibrator(estimator: ClassifierMixin | None = None, calibrator: str | RegressorMixin | None = None, cv: str | None = 'split')[source]

Top-label calibration for multi-class problems. Performs a calibration on the class with the highest score given both score and class, see section 2 of [1].

Parameters:
estimatorOptional[ClassifierMixin]

Any classifier with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default None. If None, estimator defaults to a LogisticRegression instance.

calibratorOptional[Union[str, RegressorMixin]]

Any calibrator with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default None. If None, calibrator defaults to a string “sigmoid” instance.

By default None.

cv: Optional[str]

The cross-validation strategy to compute scores :

  • “split”, performs a standard splitting into a calibration and a test set.

  • “prefit”, assumes that estimator has been fitted already. All the data that are provided in the fit method are then used to calibrate the predictions through the score computation.

By default “split”.

Attributes:
classes_: NDArray

Array with the name of each class.

n_classes_: int

Number of classes that are in the training dataset.

uncalib_pred: NDArray

Array of the uncalibrated predictions set by the estimator.

single_estimator_: ClassifierMixin

Classifier fitted on the training data.

calibrators: Dict[Union[int, str], RegressorMixin]

Dictionnary of all the fitted calibrators.

References

[1] Gupta, Chirag, and Aaditya K. Ramdas. “Top-label calibration and multiclass-to-binary reductions.” arXiv preprint arXiv:2107.08353 (2021).

Examples

>>> import numpy as np
>>> from mapie.calibration import TopLabelCalibrator
>>> X_toy = np.arange(9).reshape(-1, 1)
>>> y_toy = np.stack([0, 0, 1, 0, 1, 2, 1, 2, 2])
>>> mapie = TopLabelCalibrator().fit(X_toy, y_toy, random_state=20)
>>> y_calib = mapie.predict_proba(X_toy)
>>> print(y_calib)
[[0.84......        nan        nan]
 [0.75......        nan        nan]
 [0.62......        nan        nan]
 [       nan 0.33......        nan]
 [       nan 0.33......        nan]
 [       nan 0.33......        nan]
 [       nan        nan 0.33......]
 [       nan        nan 0.54......]
 [       nan        nan 0.66......]]
__init__(estimator: ClassifierMixin | None = None, calibrator: str | RegressorMixin | None = None, cv: str | None = 'split') None[source]
fit(X: ArrayLike, y: ArrayLike, sample_weight: ndarray[tuple[Any, ...], dtype[_ScalarT]] | None = None, calib_size: float | None = 0.33, random_state: int | RandomState | None = None, shuffle: bool | None = True, stratify: ArrayLike | None = None, **fit_params) TopLabelCalibrator[source]

Calibrate the estimator on given datasets, according to the chosen method.

Parameters:
XArrayLike of shape (n_samples, n_features)

Training data.

yArrayLike of shape (n_samples,)

Training labels.

sample_weightOptional[ArrayLike] of shape (n_samples,)

Sample weights for fitting the out-of-fold models. If None, then samples are equally weighted. Note that the sample weight defined are only for the training, not for the calibration procedure. By default None.

calib_sizeOptional[float]

If cv == split and X_calib and y_calib are not defined, then the calibration dataset is created with the split defined by calib_size.

random_stateint, RandomState instance or None, default is

None See sklearn.model_selection.train_test_split documentation. Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

shufflebool, default=True

See sklearn.model_selection.train_test_split documentation. Whether or not to shuffle the data before splitting. If shuffle=False, then stratify must be None.

stratifyarray-like, default=None

See sklearn.model_selection.train_test_split documentation. If not None, data is split in a stratified fashion, using this as the class label.

**fit_paramsdict

Additional fit parameters.

Returns:
TopLabelCalibrator

The model itself.

property is_fitted

Returns True if the estimator is fitted

predict(X: ArrayLike) ndarray[tuple[Any, ...], dtype[_ScalarT]][source]

Predict the class of the estimator after calibration. Note that in the top-label setting, this class does not change.

Parameters:
XArrayLike of shape (n_samples, n_features)

Test data.

Returns:
NDArray of shape (n_samples,)

The class from the scores.

predict_proba(X: ArrayLike) ndarray[tuple[Any, ...], dtype[_ScalarT]][source]

Prediction of the calibrated scores using fitted classifier and calibrator.

Parameters:
XArrayLike of shape (n_samples, n_features)

Test data.

Returns:
NDArray of shape (n_samples, n_classes)

The calibrated score for each max score and zeros at every other position in that line.

set_fit_request(*, calib_size: bool | None | str = '$UNCHANGED$', random_state: bool | None | str = '$UNCHANGED$', sample_weight: bool | None | str = '$UNCHANGED$', shuffle: bool | None | str = '$UNCHANGED$', stratify: bool | None | str = '$UNCHANGED$') TopLabelCalibrator

Configure whether metadata should be requested to be passed to the fit method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
calib_sizestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for calib_size parameter in fit.

random_statestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for random_state parameter in fit.

sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in fit.

shufflestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for shuffle parameter in fit.

stratifystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for stratify parameter in fit.

Returns:
selfobject

The updated object.

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') TopLabelCalibrator

Configure whether metadata should be requested to be passed to the score method.

Note that this method is only relevant when this estimator is used as a sub-estimator within a meta-estimator and metadata routing is enabled with enable_metadata_routing=True (see sklearn.set_config()). Please check the User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to score.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Parameters:
sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED

Metadata routing for sample_weight parameter in score.

Returns:
selfobject

The updated object.