mapie.regression.JackknifeAfterBootstrapRegressor

class mapie.regression.JackknifeAfterBootstrapRegressor(estimator: RegressorMixin = LinearRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseRegressionScore] = 'absolute', method: str = 'plus', resampling: Union[int, Subsample] = 30, aggregation_method: str = 'mean', n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None)[source]

Computes prediction intervals using the jackknife-after-bootstrap technique:

  1. The fit_conformalize method estimates the uncertainty of the base regressor using bootstrap sampling. It fits the base regressor on samples of the dataset and computes conformity scores on the out-of-sample data.

  2. The predict_interval computes prediction points and intervals.

Parameters
estimatorRegressorMixin, default=LinearRegression()

The base regressor used to predict points.

confidence_levelUnion[float, List[float]], default=0.9

The confidence level(s) for the prediction intervals, indicating the desired coverage probability of the prediction intervals. If a float is provided, it represents a single confidence level. If a list, multiple prediction intervals for each specified confidence level are returned.

conformity_scoreUnion[str, BaseRegressionScore], default=”absolute”

The method used to compute conformity scores

Valid options:

  • “absolute”

  • “gamma”

  • The corresponding subclasses of BaseRegressionScore

A custom score function inheriting from BaseRegressionScore may also be provided.

See :ref:theoretical_description_conformity_scores.

methodstr, default=”plus”

The method used to compute prediction intervals. Options are:

  • “plus”: Based on the conformity scores from each bootstrap sample and the testing prediction.

  • “minmax”: Based on the minimum and maximum conformity scores from each bootstrap sample.

Note: The “base” method is not mentioned in the conformal inference literature for Jackknife after bootstrap strategies, hence not provided here.

resamplingUnion[int, Subsample], default=30

Number of bootstrap resamples or an instance of Subsample for custom sampling strategy.

aggregation_methodstr, default=”mean”

Aggregation method for predictions across bootstrap samples. Options:

  • “mean”

  • “median”

n_jobsOptional[int], default=None

The number of jobs to run in parallel when applicable.

verboseint, default=0

Controls the verbosity level. Higher values increase the output details.

random_stateOptional[Union[int, np.random.RandomState]], default=None

A seed or random state instance to ensure reproducibility in any random operations within the regressor.

Examples

>>> from mapie.regression import JackknifeAfterBootstrapRegressor
>>> from sklearn.datasets import make_regression
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.linear_model import Ridge
>>> X_full, y_full = make_regression(n_samples=500,n_features=2,noise=1.0)
>>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_regressor = JackknifeAfterBootstrapRegressor(
...     estimator=Ridge(),
...     confidence_level=0.95,
...     resampling=25,
... ).fit_conformalize(X, y)
>>> predicted_points, predicted_intervals = mapie_regressor.predict_interval(X_test)
__init__(estimator: RegressorMixin = LinearRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseRegressionScore] = 'absolute', method: str = 'plus', resampling: Union[int, Subsample] = 30, aggregation_method: str = 'mean', n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None) None[source]
fit_conformalize(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], fit_params: Optional[dict] = None, predict_params: Optional[dict] = None) JackknifeAfterBootstrapRegressor[source]

Estimates the uncertainty of the base regressor using bootstrap sampling: fits the base regressor on (potentially overlapping) samples of the dataset, and computes conformity scores on the corresponding out of samples data.

Parameters
XArrayLike

Features. Must be the same X used in .fit

yArrayLike

Targets. Must be the same y used in .fit

fit_paramsOptional[dict], default=None

Parameters to pass to the fit method of the base regressor.

predict_paramsOptional[dict], default=None

Parameters to pass to the predict method of the base regressor. These parameters will also be used in the predict_interval and predict methods of this JackknifeAfterBootstrapRegressor.

Returns
Self

This JackknifeAfterBootstrapRegressor instance, fitted and conformalized.

predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], ensemble: bool = True) ndarray[Any, dtype[_ScalarType_co]][source]

Predicts points.

By default, points are predicted using an aggregation. See the ensemble parameter.

Parameters
XArrayLike

Data features for generating point predictions.

ensemblebool, default=True

If True, a predicted point is an aggregation of the predictions of the regressors trained on each bootstrap samples. This aggregation depends on the aggregation_method provided during initialisation. If False, a point is predicted using the regressor trained on the entire data

Returns
NDArray

Array of point predictions, with shape (n_samples,).

predict_interval(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], ensemble: bool = True, minimize_interval_width: bool = False, allow_infinite_bounds: bool = False) Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]][source]

Predicts points and intervals.

If several confidence levels were provided during initialisation, several intervals will be predicted for each sample. See the return signature.

By default, points are predicted using an aggregation. See the ensemble parameter.

Parameters
XArrayLike

Test data for prediction intervals.

ensemblebool, default=True

If True, a predicted point is an aggregation of the predictions of the regressors trained on each bootstrap samples. This aggregation depends on the aggregation_method provided during initialisation.

If False, a point is predicted using the regressor trained on the entire data

minimize_interval_widthbool, default=False

If True, attempts to minimize the interval width.

allow_infinite_boundsbool, default=False

If True, allows prediction intervals with infinite bounds.

Returns
Tuple[NDArray, NDArray]

Two arrays:

  • Prediction points, of shape (n_samples,)

  • Prediction intervals, of shape (n_samples, 2, n_confidence_levels)