mapie.regression.ConformalizedQuantileRegressor

class mapie.regression.ConformalizedQuantileRegressor(estimator: Optional[Union[RegressorMixin, Pipeline, List[Union[RegressorMixin, Pipeline]]]] = None, confidence_level: float = 0.9, prefit: bool = False)[source]

Computes prediction intervals using the conformalized quantile regression technique:

  1. The fit method fits three models to the training data using the provided regressor: a model to predict the target, and models to predict upper and lower quantiles around the target.

  2. The conformalize method estimates the uncertainty of the quantile models using the conformalization set.

  3. The predict_interval computes prediction points and intervals.

Parameters
estimatorUnion[RegressorMixin, Pipeline, List[Union[RegressorMixin, Pipeline]]]

The regressor used to predict points and quantiles.

When prefit=False (default), a single regressor that supports the quantile loss must be passed. Valid options:

  • sklearn.linear_model.QuantileRegressor

  • sklearn.ensemble.GradientBoostingRegressor

  • sklearn.ensemble.HistGradientBoostingRegressor

  • lightgbm.LGBMRegressor

When prefit=True, a list of three fitted quantile regressors predicting the lower, upper, and median quantiles must be passed (in that order). These quantiles must be:

  • lower quantile = (1 - confidence_level) / 2

  • upper quantile = (1 + confidence_level) / 2

  • median quantile = 0.5

confidence_levelfloat default=0.9

The confidence level for the prediction intervals, indicating the desired coverage probability of the prediction intervals.

prefitbool, default=False

If True, three fitted quantile regressors must be provided, and the fit method must be skipped.

If False, the three regressors will be fitted during the fit method.

Examples

>>> from mapie.regression import ConformalizedQuantileRegressor
>>> from mapie.utils import train_conformalize_test_split
>>> from sklearn.datasets import make_regression
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import QuantileRegressor
>>> X, y = make_regression(n_samples=500, n_features=2, noise=1.0)
>>> (
...     X_train, X_conformalize, X_test,
...     y_train, y_conformalize, y_test
... ) = train_conformalize_test_split(
...     X, y, train_size=0.6, conformalize_size=0.2, test_size=0.2, random_state=1
... )
>>> mapie_regressor = ConformalizedQuantileRegressor(
...     estimator=QuantileRegressor(),
...     confidence_level=0.95,
... ).fit(X_train, y_train).conformalize(X_conformalize, y_conformalize)
>>> predicted_points, predicted_intervals = mapie_regressor.predict_interval(X_test)
__init__(estimator: Optional[Union[RegressorMixin, Pipeline, List[Union[RegressorMixin, Pipeline]]]] = None, confidence_level: float = 0.9, prefit: bool = False) None[source]
conformalize(X_conformalize: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y_conformalize: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], predict_params: Optional[dict] = None) ConformalizedQuantileRegressor[source]

Estimates the uncertainty of the quantile regressors by computing conformity scores on the conformalization set.

Parameters
X_conformalizeArrayLike

Features of the conformalization set.

y_conformalizeArrayLike

Targets of the conformalization set.

predict_paramsOptional[dict], default=None

Parameters to pass to the predict method of the regressors. These parameters will also be used in the predict_interval and predict methods of this SplitConformalRegressor.

Returns
Self

The ConformalizedQuantileRegressor instance.

fit(X_train: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y_train: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], fit_params: Optional[dict] = None) ConformalizedQuantileRegressor[source]

Fits three models using the regressor provided at initialisation:

  • a model to predict the target

  • a model to predict the upper quantile of the target

  • a model to predict the lower quantile of the target

Parameters
X_trainArrayLike

Training data features.

y_trainArrayLike

Training data targets.

fit_paramsOptional[dict], default=None

Parameters to pass to the fit method of the regressors.

Returns
Self

The fitted ConformalizedQuantileRegressor instance.

predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]) ndarray[Any, dtype[_ScalarType_co]][source]

Predicts points.

Parameters
XArrayLike

Features

Returns
NDArray

Array of point predictions with shape (n_samples,).

predict_interval(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], minimize_interval_width: bool = False, allow_infinite_bounds: bool = False, symmetric_correction: bool = False) Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]][source]

Predicts points (using the base regressor) and intervals.

The returned NDArray containing the prediction intervals is of shape (n_samples, 2, 1). The third dimension is unnecessary, but kept for consistency with the other conformal regression methods available in MAPIE.

Parameters
XArrayLike

Features

minimize_interval_widthbool, default=False

If True, attempts to minimize the intervals width.

allow_infinite_boundsbool, default=False

If True, allows prediction intervals with infinite bounds.

symmetric_correctionbool, default=False

To produce prediction intervals, the conformalized quantile regression technique corrects the predictions of the upper and lower quantile regressors by adding a constant.

If symmetric_correction is set to False , this constant is different for the upper and the lower quantile predictions. If set to True, this constant is the same for both.

Returns
Tuple[NDArray, NDArray]

Two arrays:

  • Prediction points, of shape (n_samples,)

  • Prediction intervals, of shape (n_samples, 2, 1)