mapie.regression
.JackknifeAfterBootstrapRegressor¶
- class mapie.regression.JackknifeAfterBootstrapRegressor(estimator: RegressorMixin = LinearRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseRegressionScore] = 'absolute', method: str = 'plus', resampling: Union[int, Subsample] = 30, aggregation_method: str = 'mean', n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None)[source]¶
Computes prediction intervals using the jackknife-after-bootstrap technique:
The
fit_conformalize
method estimates the uncertainty of the base regressor using bootstrap sampling. It fits the base regressor on samples of the dataset and computes conformity scores on the out-of-sample data.The
predict_interval
computes prediction points and intervals.
- Parameters
- estimatorRegressorMixin, default=LinearRegression()
The base regressor used to predict points.
- confidence_levelUnion[float, List[float]], default=0.9
The confidence level(s) for the prediction intervals, indicating the desired coverage probability of the prediction intervals. If a float is provided, it represents a single confidence level. If a list, multiple prediction intervals for each specified confidence level are returned.
- conformity_scoreUnion[str, BaseRegressionScore], default=”absolute”
The method used to compute conformity scores
Valid options:
“absolute”
“gamma”
The corresponding subclasses of BaseRegressionScore
A custom score function inheriting from BaseRegressionScore may also be provided.
See :ref:
theoretical_description_conformity_scores
.- methodstr, default=”plus”
The method used to compute prediction intervals. Options are:
“plus”: Based on the conformity scores from each bootstrap sample and the testing prediction.
“minmax”: Based on the minimum and maximum conformity scores from each bootstrap sample.
Note: The “base” method is not mentioned in the conformal inference literature for Jackknife after bootstrap strategies, hence not provided here.
- resamplingUnion[int, Subsample], default=30
Number of bootstrap resamples or an instance of
Subsample
for custom sampling strategy.- aggregation_methodstr, default=”mean”
Aggregation method for predictions across bootstrap samples. Options:
“mean”
“median”
- n_jobsOptional[int], default=None
The number of jobs to run in parallel when applicable.
- verboseint, default=0
Controls the verbosity level. Higher values increase the output details.
- random_stateOptional[Union[int, np.random.RandomState]], default=None
A seed or random state instance to ensure reproducibility in any random operations within the regressor.
Examples
>>> from mapie.regression import JackknifeAfterBootstrapRegressor >>> from sklearn.datasets import make_regression >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import Ridge
>>> X_full, y_full = make_regression(n_samples=500,n_features=2,noise=1.0) >>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_regressor = JackknifeAfterBootstrapRegressor( ... estimator=Ridge(), ... confidence_level=0.95, ... resampling=25, ... ).fit_conformalize(X, y)
>>> predicted_points, predicted_intervals = mapie_regressor.predict_interval(X_test)
- __init__(estimator: RegressorMixin = LinearRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseRegressionScore] = 'absolute', method: str = 'plus', resampling: Union[int, Subsample] = 30, aggregation_method: str = 'mean', n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None) None [source]¶
- fit_conformalize(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], fit_params: Optional[dict] = None, predict_params: Optional[dict] = None) JackknifeAfterBootstrapRegressor [source]¶
Estimates the uncertainty of the base regressor using bootstrap sampling: fits the base regressor on (potentially overlapping) samples of the dataset, and computes conformity scores on the corresponding out of samples data.
- Parameters
- XArrayLike
Features. Must be the same X used in .fit
- yArrayLike
Targets. Must be the same y used in .fit
- fit_paramsOptional[dict], default=None
Parameters to pass to the
fit
method of the base regressor.- predict_paramsOptional[dict], default=None
Parameters to pass to the
predict
method of the base regressor. These parameters will also be used in thepredict_interval
andpredict
methods of this JackknifeAfterBootstrapRegressor.
- Returns
- Self
This JackknifeAfterBootstrapRegressor instance, fitted and conformalized.
- predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], ensemble: bool = True) ndarray[Any, dtype[_ScalarType_co]] [source]¶
Predicts points.
By default, points are predicted using an aggregation. See the
ensemble
parameter.- Parameters
- XArrayLike
Data features for generating point predictions.
- ensemblebool, default=True
If True, a predicted point is an aggregation of the predictions of the regressors trained on each bootstrap samples. This aggregation depends on the
aggregation_method
provided during initialisation. If False, a point is predicted using the regressor trained on the entire data
- Returns
- NDArray
Array of point predictions, with shape
(n_samples,)
.
- predict_interval(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], ensemble: bool = True, minimize_interval_width: bool = False, allow_infinite_bounds: bool = False) Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]] [source]¶
Predicts points and intervals.
If several confidence levels were provided during initialisation, several intervals will be predicted for each sample. See the return signature.
By default, points are predicted using an aggregation. See the
ensemble
parameter.- Parameters
- XArrayLike
Test data for prediction intervals.
- ensemblebool, default=True
If True, a predicted point is an aggregation of the predictions of the regressors trained on each bootstrap samples. This aggregation depends on the
aggregation_method
provided during initialisation.If False, a point is predicted using the regressor trained on the entire data
- minimize_interval_widthbool, default=False
If True, attempts to minimize the interval width.
- allow_infinite_boundsbool, default=False
If True, allows prediction intervals with infinite bounds.
- Returns
- Tuple[NDArray, NDArray]
Two arrays:
Prediction points, of shape
(n_samples,)
Prediction intervals, of shape
(n_samples, 2, n_confidence_levels)