mapie.regression.MapieQuantileRegressor

class mapie.regression.MapieQuantileRegressor(estimator: Optional[Union[sklearn.base.RegressorMixin, sklearn.pipeline.Pipeline, List[Union[sklearn.base.RegressorMixin, sklearn.pipeline.Pipeline]]]] = None, method: str = 'quantile', cv: Optional[str] = None, alpha: float = 0.1)[source]

This class implements the conformalized quantile regression strategy as proposed by Romano et al. (2019) to make conformal predictions. The only valid method is "quantile" and the only valid cv is "split".

Parameters
estimatorOptional[RegressorMixin]

Any regressor with scikit-learn API (i.e. with fit and predict methods). If None, estimator defaults to a QuantileRegressor instance.

By default "None".

method: str

Method to choose for prediction, in this case, the only valid method is the "quantile" method.

By default "quantile".

cv: Optional[str]

The cross-validation strategy for computing conformity scores. In theory a split method is implemented as it is needed to provide both a training and calibration set.

By default None.

alpha: float

Between 0.0 and 1.0, represents the risk level of the confidence interval. Lower alpha produce larger (more conservative) prediction intervals. alpha is the complement of the target coverage level.

By default 0.1.

References

Yaniv Romano, Evan Patterson and Emmanuel J. Candès. “Conformalized Quantile Regression” Advances in neural information processing systems 32 (2019).

Examples

>>> import numpy as np
>>> from mapie.regression import MapieQuantileRegressor
>>> X_train = np.array([[0], [1], [2], [3], [4], [5]])
>>> y_train = np.array([5, 7.5, 9.5, 10.5, 12.5, 15])
>>> X_calib = np.array([[0], [1], [2], [3], [4], [5], [6], [7], [8], [9]])
>>> y_calib = np.array([5, 7, 9, 4, 8, 1, 5, 7.5, 9.5, 12])
>>> mapie_reg = MapieQuantileRegressor().fit(
...     X_train,
...     y_train,
...     X_calib=X_calib,
...     y_calib=y_calib
... )
>>> y_pred, y_pis = mapie_reg.predict(X_train)
>>> print(y_pis[:, :, 0])
[[-8.16666667 19.        ]
 [-6.33333333 20.83333333]
 [-4.5        22.66666667]
 [-2.66666667 24.5       ]
 [-0.83333333 26.33333333]
 [ 1.         28.16666667]]
>>> print(y_pred)
[ 5.  7.  9. 11. 13. 15.]
Attributes
valid_methods_: List[str]

List of all valid methods.

single_estimator_: RegressorMixin

Estimator fitted on the whole training set.

estimators_: List[RegressorMixin]
  • [0]: Estimator with quantile value of alpha/2

  • [1]: Estimator with quantile value of 1 - alpha/2

  • [2]: Estimator with quantile value of 0.5

conformity_scores_: NDArray of shape (n_samples_train, 3)

Conformity scores between y_calib and y_pred.

  • [:, 0]: for y_calib coming from prediction estimator with quantile of alpha/2

  • [:, 1]: for y_calib coming from prediction estimator with quantile of 1 - alpha/2

  • [:, 2]: maximum of those first two scores

n_calib_samples: int

Number of samples in the calibration dataset.

__init__(estimator: Optional[Union[sklearn.base.RegressorMixin, sklearn.pipeline.Pipeline, List[Union[sklearn.base.RegressorMixin, sklearn.pipeline.Pipeline]]]] = None, method: str = 'quantile', cv: Optional[str] = None, alpha: float = 0.1) None[source]
fit(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], sample_weight: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, X_calib: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, y_calib: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, calib_size: Optional[float] = 0.3, random_state: Optional[Union[numpy.random.mtrand.RandomState, int]] = None, shuffle: Optional[bool] = True, stratify: Optional[Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None) mapie.regression.quantile_regression.MapieQuantileRegressor[source]

Fit estimator and compute residuals used for prediction intervals. All the clones of the estimators for different quantile values are stored in order alpha/2, 1 - alpha/2, 0.5 in the estimators_ attribute. Residuals for the first two estimators and the maximum of residuals among these residuals are stored in the conformity_scores_ attribute.

Parameters
X: ArrayLike of shape (n_samples, n_features)

Training data.

y: ArrayLike of shape (n_samples,)

Training labels.

sample_weight: Optional[ArrayLike] of shape (n_samples,)

Sample weights for fitting the out-of-fold models. If None, then samples are equally weighted. If some weights are null, their corresponding observations are removed before the fitting process and hence have no residuals. If weights are non-uniform, residuals are still uniformly weighted. Note that the sample weight defined are only for the training, not for the calibration procedure.

By default None.

X_calib: Optional[ArrayLike] of shape (n_calib_samples, n_features)

Calibration data.

y_calib: Optional[ArrayLike] of shape (n_calib_samples,)

Calibration labels.

calib_size: Optional[float]

If X_calib and y_calib are not defined, then the calibration dataset is created with the split defined by calib_size.

random_state: Optional[Union[int, np.random.RandomState]], default=None

For the sklearn.model_selection.train_test_split documentation. Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls. See Glossary.

By default None.

shuffle: bool, default=True

For the sklearn.model_selection.train_test_split documentation. Whether or not to shuffle the data before splitting. If shuffle=False then stratify must be None.

By default True.

stratify: array-like, default=None

For the sklearn.model_selection.train_test_split documentation. If not None, data is split in a stratified fashion, using this as the class labels. Read more in the User Guide.

By default None.

Returns
MapieQuantileRegressor

The model itself.

predict(X: Union[numpy._typing._array_like._SupportsArray[numpy.dtype], numpy._typing._nested_sequence._NestedSequence[numpy._typing._array_like._SupportsArray[numpy.dtype]], bool, int, float, complex, str, bytes, numpy._typing._nested_sequence._NestedSequence[Union[bool, int, float, complex, str, bytes]]], ensemble: bool = False, alpha: Optional[Union[float, Iterable[float]]] = None, symmetry: Optional[bool] = True) Union[numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]], Tuple[numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]], numpy.ndarray[Any, numpy.dtype[numpy._typing._generic_alias.ScalarType]]]][source]

Predict target on new samples with confidence intervals. Residuals from the training set and predictions from the model clones are central to the computation. Prediction Intervals for a given alpha are deduced from the quantile regression at the alpha values: alpha/2, 1 - (alpha/2) while adding a constant based uppon their residuals.

Parameters
X: ArrayLike of shape (n_samples, n_features)

Test data.

ensemble: bool

Ensemble has not been defined in predict and therefore should will not have any effects in this method.

alpha: Optional[Union[float, Iterable[float]]]

For MapieQuantileRegresor the alpha has to be defined directly in initial arguments of the class.

symmetry: Optional[bool]

Deciding factor to whether to find the quantile value for each residuals separatly or to use the maximum of the two combined.

Returns
Union[NDArray, Tuple[NDArray, NDArray]]
  • NDArray of shape (n_samples,) if alpha is None.

  • Tuple[NDArray, NDArray] of shapes (n_samples,) and (n_samples, 2, n_alpha) if alpha is not None.

    • [:, 0, :]: Lower bound of the prediction interval.

    • [:, 1, :]: Upper bound of the prediction interval.

Examples using mapie.regression.MapieQuantileRegressor