mapie.calibration
.MapieCalibrator¶
- class mapie.calibration.MapieCalibrator(estimator: Optional[ClassifierMixin] = None, method: str = 'top_label', calibrator: Optional[Union[str, RegressorMixin]] = None, cv: Optional[str] = 'split')[source]¶
Calibration for multi-class problems.
This class performs calibration for various methods, currently only top-label calibration [1].
- Parameters
- estimatorOptional[ClassifierMixin]
Any classifier with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default
None
. IfNone
, estimator defaults to aLogisticRegression
instance.- method: Optional[str]
The only valid method is “top_label”. Performs a calibration on the class with highest score given both score and class, see section 2 of [1].
By default “top_label”.
- calibratorOptional[Union[str, RegressorMixin]]
Any calibrator with scikit-learn API (i.e. with fit, predict, and predict_proba methods), by default
None
. IfNone
, calibrator defaults to a string “sigmoid” instance.By default
None
.- cv: Optional[str]
The cross-validation strategy to compute scores :
“split”, performs a standard splitting into a calibration and a test set.
“prefit”, assumes that
estimator
has been fitted already. All the data that are provided in thefit
method are then used to calibrate the predictions through the score computation.
By default “split”.
References
[1] Gupta, Chirag, and Aaditya K. Ramdas. “Top-label calibration and multiclass-to-binary reductions.” arXiv preprint arXiv:2107.08353 (2021).
Examples
>>> import numpy as np >>> from mapie.calibration import MapieCalibrator >>> X_toy = np.arange(9).reshape(-1, 1) >>> y_toy = np.stack([0, 0, 1, 0, 1, 2, 1, 2, 2]) >>> mapie = MapieCalibrator().fit(X_toy, y_toy, random_state=20) >>> y_calib = mapie.predict_proba(X_toy) >>> print(y_calib) [[0.84...... nan nan] [0.75...... nan nan] [0.62...... nan nan] [ nan 0.33...... nan] [ nan 0.33...... nan] [ nan 0.33...... nan] [ nan nan 0.33......] [ nan nan 0.54......] [ nan nan 0.66......]]
- Attributes
- valid_methods: List[str]
List of all valid methods.
- classes_: NDArray
Array with the name of each class.
- n_classes_: int
Number of classes that are in the training dataset.
- uncalib_pred: NDArray
Array of the uncalibrated predictions set by the
estimator
.- single_estimator_: ClassifierMixin
Classifier fitted on the training data.
- calibrators: Dict[Union[int, str], RegressorMixin]
Dictionnary of all the fitted calibrators.
- __init__(estimator: Optional[ClassifierMixin] = None, method: str = 'top_label', calibrator: Optional[Union[str, RegressorMixin]] = None, cv: Optional[str] = 'split') None [source]¶
- fit(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], sample_weight: Optional[ndarray[Any, dtype[_ScalarType_co]]] = None, calib_size: Optional[float] = 0.33, random_state: Optional[Union[int, RandomState]] = None, shuffle: Optional[bool] = True, stratify: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, **fit_params) MapieCalibrator [source]¶
Calibrate the estimator on given datasets, according to the chosen method.
- Parameters
- XArrayLike of shape (n_samples, n_features)
Training data.
- yArrayLike of shape (n_samples,)
Training labels.
- sample_weightOptional[ArrayLike] of shape (n_samples,)
Sample weights for fitting the out-of-fold models. If
None
, then samples are equally weighted. Note that the sample weight defined are only for the training, not for the calibration procedure. By defaultNone
.- calib_sizeOptional[float]
If
cv == split
and X_calib and y_calib are not defined, then the calibration dataset is created with the split defined by calib_size.- random_stateint, RandomState instance or
None
, default is None
Seesklearn.model_selection.train_test_split
documentation. Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.- shufflebool, default=True
See
sklearn.model_selection.train_test_split
documentation. Whether or not to shuffle the data before splitting. If shuffle=False, then stratify must beNone
.- stratifyarray-like, default=None
See
sklearn.model_selection.train_test_split
documentation. If notNone
, data is split in a stratified fashion, using this as the class label.- **fit_paramsdict
Additional fit parameters.
- Returns
- MapieCalibrator
The model itself.
- predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray[Any, dtype[_ScalarType_co]] [source]¶
Predict the class of the estimator after calibration. Note that in the top-label setting, this class does not change.
- Parameters
- XArrayLike of shape (n_samples, n_features)
Test data.
- Returns
- NDArray of shape (n_samples,)
The class from the scores.
- predict_proba(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray[Any, dtype[_ScalarType_co]] [source]¶
Prediction of the calibrated scores using fitted classifier and calibrator.
- Parameters
- XArrayLike of shape (n_samples, n_features)
Test data.
- Returns
- NDArray of shape (n_samples, n_classes)
The calibrated score for each max score and zeros at every other position in that line.
- set_fit_request(*, calib_size: Union[bool, None, str] = '$UNCHANGED$', random_state: Union[bool, None, str] = '$UNCHANGED$', sample_weight: Union[bool, None, str] = '$UNCHANGED$', shuffle: Union[bool, None, str] = '$UNCHANGED$', stratify: Union[bool, None, str] = '$UNCHANGED$') MapieCalibrator ¶
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
- calib_sizestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
calib_size
parameter infit
.- random_statestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
random_state
parameter infit
.- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter infit
.- shufflestr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
shuffle
parameter infit
.- stratifystr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
stratify
parameter infit
.
- Returns
- selfobject
The updated object.
- set_score_request(*, sample_weight: Union[bool, None, str] = '$UNCHANGED$') MapieCalibrator ¶
Request metadata passed to the
score
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed toscore
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it toscore
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
- sample_weightstr, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED
Metadata routing for
sample_weight
parameter inscore
.
- Returns
- selfobject
The updated object.