`mapie.calibration`.MapieCalibrator¶

class mapie.calibration.MapieCalibrator(estimator: Optional[sklearn.base.ClassifierMixin] = None, method: str = 'top_label', calibrator: Optional[Union[str, sklearn.base.RegressorMixin]] = None, cv: Optional[str] = 'split')[source]¶

Calibration for multi-class problems.

This class performs calibration for various methods, currently only top-label calibration [1].

Parameters

estimatorOptional[ClassifierMixin]

Any classifier with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default None. If None, estimator defaults to a LogisticRegression instance.

method: Optional[str]

Method to choose for calibration method. Choose among:

“top_label”, performs a calibration on the class with highest score

given both score and class, see section 2 of [1].

By default “top_label”.

calibratorOptional[Union[str, RegressorMixin]]

Any calibrator with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default None. If None, calibrator defaults to a string “sigmoid” instance.

By default None.

cv: Optional[str]

The cross-validation strategy to compute scores :

“split”, performs a standard splitting into a calibration and a

test set. - “prefit”, assumes that estimator has been fitted already.

All the data that are provided in the fit method are then used to calibrate the predictions through the score computation.

By default “split”.

References

[1] Gupta, Chirag, and Aaditya K. Ramdas. “Top-label calibration and multiclass-to-binary reductions.” arXiv preprint arXiv:2107.08353 (2021).

Examples

>>> import numpy as np
>>> from mapie.calibration import MapieCalibrator
>>> X_toy = np.arange(9).reshape(-1, 1)
>>> y_toy = np.stack([0, 0, 1, 0, 1, 2, 1, 2, 2])
>>> mapie = MapieCalibrator().fit(X_toy, y_toy, random_state=20)
>>> y_calib = mapie.predict_proba(X_toy)
>>> print(y_calib)
[[0.84900723        nan        nan]
 [0.75432411        nan        nan]
 [0.62285341        nan        nan]
 [       nan 0.33333333        nan]
 [       nan 0.33333333        nan]
 [       nan 0.33333333        nan]
 [       nan        nan 0.33333002]
 [       nan        nan 0.54326683]
 [       nan        nan 0.66666124]]

Attributes

valid_methods: List[str]: List of all valid methods.
classes_: NDArray: Array with the name of each class.
n_classes_: int: Number of classes that are in the training dataset.
uncalib_pred: NDArray: Array of the uncalibrated predictions set by the estimator.
single_estimator_: ClassifierMixin: Classifier fitted on the training data.
calibrators: Dict[Union[int, str], RegressorMixin]: Dictionnary of all the fitted calibrators.

__init__(estimator: Optional[sklearn.base.ClassifierMixin] = None, method: str = 'top_label', calibrator: Optional[Union[str, sklearn.base.RegressorMixin]] = None, cv: Optional[str] = 'split') → None[source]¶

fit(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], sample_weight: Optional[numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]]] = None, calib_size: Optional[float] = 0.33, random_state: Optional[Union[numpy.random.mtrand.RandomState, int]] = None, shuffle: Optional[bool] = True, stratify: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) → mapie.calibration.MapieCalibrator[source]¶

Calibrate the estimator on given datasets, according to the chosen method.

Parameters

XArrayLike of shape (n_samples, n_features): Training data.
yArrayLike of shape (n_samples,): Training labels.
sample_weightOptional[ArrayLike] of shape (n_samples,): Sample weights for fitting the out-of-fold models. If None, then samples are equally weighted. Note that the sample weight defined are only for the training, not for the calibration procedure. By default None.
calib_sizeOptional[float]: If cv == split and X_calib and y_calib are not defined, then the calibration dataset is created with the split defined by calib_size.
random_stateint, RandomState instance or None, default is: None See sklearn.model_selection.train_test_split documentation. Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.
shufflebool, default=True: See sklearn.model_selection.train_test_split documentation. Whether or not to shuffle the data before splitting. If shuffle=False, then stratify must be None.
stratifyarray-like, default=None: See sklearn.model_selection.train_test_split documentation. If not None, data is split in a stratified fashion, using this as the class label.

Returns

MapieCalibrator: The model itself.

predict(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]) → numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]][source]¶

Predict the class of the estimator after calibration. Note that in the top-label setting, this class does not change.

Parameters

XArrayLike of shape (n_samples, n_features): Test data.

Returns

NDArray of shape (n_samples,): The class from the scores.

predict_proba(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]) → numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]][source]¶

Prediction of the calibrated scores using fitted classifier and calibrator.

Parameters

XArrayLike of shape (n_samples, n_features): Test data.

Returns

NDArray of shape (n_samples, n_classes): The calibrated score for each max score and zeros at every other position in that line.

mapie.calibration.MapieCalibrator¶

`mapie.calibration`.MapieCalibrator¶