mapie.calibration.MapieCalibrator¶
- class mapie.calibration.MapieCalibrator(estimator: Optional[sklearn.base.ClassifierMixin] = None, method: str = 'top_label', calibrator: Optional[Union[str, sklearn.base.RegressorMixin]] = None, cv: Optional[str] = 'split')[source]¶
Calibration for multi-class problems.
This class performs calibration for various methods, currently only top-label calibration [1].
- Parameters
- estimatorOptional[ClassifierMixin]
Any classifier with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default
None. IfNone, estimator defaults to aLogisticRegressioninstance.- method: Optional[str]
Method to choose for calibration method. Choose among:
“top_label”, performs a calibration on the class with highest score
given both score and class, see section 2 of [1].
By default “top_label”.
- calibratorOptional[Union[str, RegressorMixin]]
Any calibrator with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default
None. IfNone, calibrator defaults to a string “sigmoid” instance.By default
None.- cv: Optional[str]
The cross-validation strategy to compute scores :
“split”, performs a standard splitting into a calibration and a
test set. - “prefit”, assumes that
estimatorhas been fitted already.All the data that are provided in the
fitmethod are then used to calibrate the predictions through the score computation.By default “split”.
References
[1] Gupta, Chirag, and Aaditya K. Ramdas. “Top-label calibration and multiclass-to-binary reductions.” arXiv preprint arXiv:2107.08353 (2021).
Examples
>>> import numpy as np >>> from mapie.calibration import MapieCalibrator >>> X_toy = np.arange(9).reshape(-1, 1) >>> y_toy = np.stack([0, 0, 1, 0, 1, 2, 1, 2, 2]) >>> mapie = MapieCalibrator().fit(X_toy, y_toy, random_state=20) >>> y_calib = mapie.predict_proba(X_toy) >>> print(y_calib) [[0.84900723 nan nan] [0.75432411 nan nan] [0.62285341 nan nan] [ nan 0.33333333 nan] [ nan 0.33333333 nan] [ nan 0.33333333 nan] [ nan nan 0.33333002] [ nan nan 0.54326683] [ nan nan 0.66666124]]
- Attributes
- valid_methods: List[str]
List of all valid methods.
- classes_: NDArray
Array with the name of each class.
- n_classes_: int
Number of classes that are in the training dataset.
- uncalib_pred: NDArray
Array of the uncalibrated predictions set by the
estimator.- single_estimator_: ClassifierMixin
Classifier fitted on the training data.
- calibrators: Dict[Union[int, str], RegressorMixin]
Dictionnary of all the fitted calibrators.
- __init__(estimator: Optional[sklearn.base.ClassifierMixin] = None, method: str = 'top_label', calibrator: Optional[Union[str, sklearn.base.RegressorMixin]] = None, cv: Optional[str] = 'split') None[source]¶
- fit(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], sample_weight: Optional[numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]]] = None, calib_size: Optional[float] = 0.33, random_state: Optional[Union[numpy.random.mtrand.RandomState, int]] = None, shuffle: Optional[bool] = True, stratify: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) mapie.calibration.MapieCalibrator[source]¶
Calibrate the estimator on given datasets, according to the chosen method.
- Parameters
- XArrayLike of shape (n_samples, n_features)
Training data.
- yArrayLike of shape (n_samples,)
Training labels.
- sample_weightOptional[ArrayLike] of shape (n_samples,)
Sample weights for fitting the out-of-fold models. If
None, then samples are equally weighted. Note that the sample weight defined are only for the training, not for the calibration procedure. By defaultNone.- calib_sizeOptional[float]
If
cv == splitand X_calib and y_calib are not defined, then the calibration dataset is created with the split defined by calib_size.- random_stateint, RandomState instance or
None, default is NoneSeesklearn.model_selection.train_test_splitdocumentation. Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.- shufflebool, default=True
See
sklearn.model_selection.train_test_splitdocumentation. Whether or not to shuffle the data before splitting. If shuffle=False, then stratify must beNone.- stratifyarray-like, default=None
See
sklearn.model_selection.train_test_splitdocumentation. If notNone, data is split in a stratified fashion, using this as the class label.
- Returns
- MapieCalibrator
The model itself.
- predict(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]) numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]][source]¶
Predict the class of the estimator after calibration. Note that in the top-label setting, this class does not change.
- Parameters
- XArrayLike of shape (n_samples, n_features)
Test data.
- Returns
- NDArray of shape (n_samples,)
The class from the scores.
- predict_proba(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]) numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]][source]¶
Prediction of the calibrated scores using fitted classifier and calibrator.
- Parameters
- XArrayLike of shape (n_samples, n_features)
Test data.
- Returns
- NDArray of shape (n_samples, n_classes)
The calibrated score for each max score and zeros at every other position in that line.