mapie.regression
.CrossConformalRegressor¶
- class mapie.regression.CrossConformalRegressor(estimator: RegressorMixin = LinearRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseRegressionScore] = 'absolute', method: str = 'plus', cv: Union[int, BaseCrossValidator] = 5, n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None)[source]¶
Computes prediction intervals using the cross conformal regression technique:
The
fit_conformalize
method estimates the uncertainty of the base regressor in a cross-validation style. It fits the base regressor on folds of the dataset and computes conformity scores on the out-of-fold data.The
predict_interval
computes prediction points and intervals.
- Parameters
- estimatorRegressorMixin, default=LinearRegression()
The base regressor used to predict points.
- confidence_levelUnion[float, List[float]], default=0.9
The confidence level(s) for the prediction intervals, indicating the desired coverage probability of the prediction intervals. If a float is provided, it represents a single confidence level. If a list, multiple prediction intervals for each specified confidence level are returned.
- conformity_scoreUnion[str, BaseRegressionScore], default=”absolute”
The method used to compute conformity scores Valid options:
“absolute”
“gamma”
The corresponding subclasses of BaseRegressionScore
A custom score function inheriting from BaseRegressionScore may also be provided.
- methodstr, default=”plus”
The method used to compute prediction intervals. Options are:
“base”: Based on the conformity scores from each fold.
“plus”: Based on the conformity scores from each fold and the test set predictions.
“minmax”: Based on the conformity scores from each fold and the test set predictions, using the minimum and maximum among each fold models.
- cvUnion[int, BaseCrossValidator], default=5
The cross-validator used to compute conformity scores. Valid options:
integer, to specify the number of folds
any
sklearn.model_selection.BaseCrossValidator
suitable for regression, or a custom cross-validator inheriting from it.
Main variants in the cross conformal setting are:
sklearn.model_selection.KFold
(vanilla cross conformal)sklearn.model_selection.LeaveOneOut
(jackknife)
- n_jobsOptional[int], default=None
The number of jobs to run in parallel when applicable.
- verboseint, default=0
Controls the verbosity level. Higher values increase the output details.
- random_stateOptional[Union[int, np.random.RandomState]], default=None
A seed or random state instance to ensure reproducibility in any random operations within the regressor.
Examples
>>> from mapie.regression import CrossConformalRegressor >>> from sklearn.datasets import make_regression >>> from sklearn.model_selection import train_test_split >>> from sklearn.linear_model import Ridge
>>> X_full, y_full = make_regression(n_samples=500,n_features=2,noise=1.0) >>> X, X_test, y, y_test = train_test_split(X_full, y_full)
>>> mapie_regressor = CrossConformalRegressor( ... estimator=Ridge(), ... confidence_level=0.95, ... cv=10 ... ).fit_conformalize(X, y)
>>> predicted_points, predicted_intervals = mapie_regressor.predict_interval(X_test)
- __init__(estimator: RegressorMixin = LinearRegression(), confidence_level: Union[float, Iterable[float]] = 0.9, conformity_score: Union[str, BaseRegressionScore] = 'absolute', method: str = 'plus', cv: Union[int, BaseCrossValidator] = 5, n_jobs: Optional[int] = None, verbose: int = 0, random_state: Optional[Union[int, RandomState]] = None) None [source]¶
- fit_conformalize(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], y: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], groups: Optional[Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]]] = None, fit_params: Optional[dict] = None, predict_params: Optional[dict] = None) CrossConformalRegressor [source]¶
Estimates the uncertainty of the base regressor in a cross-validation style: fits the base regressor on different folds of the dataset and computes conformity scores on the corresponding out-of-fold data.
- Parameters
- XArrayLike
Features
- yArrayLike
Targets
- groups: Optional[ArrayLike] of shape (n_samples,), default=None
Groups to pass to the cross-validator.
- fit_paramsOptional[dict], default=None
Parameters to pass to the
fit
method of the base regressor.- predict_paramsOptional[dict], default=None
Parameters to pass to the
predict
method of the base regressor. These parameters will also be used in thepredict_interval
andpredict
methods of this CrossConformalRegressor.
- Returns
- Self
This CrossConformalRegressor instance, fitted and conformalized.
- predict(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], aggregate_predictions: Optional[str] = 'mean') ndarray[Any, dtype[_ScalarType_co]] [source]¶
Predicts points.
By default, points are predicted using an aggregation. See the
ensemble
parameter.- Parameters
- XArrayLike
Features
- aggregate_predictionsOptional[str], default=”mean”
The method to predict a point. Options:
None: a point is predicted using the regressor trained on the entire data
“mean”: Averages the predictions of the regressors trained on each cross-validation fold
“median”: Aggregates (using median) the predictions of the regressors trained on each cross-validation fold
- Returns
- NDArray
Array of point predictions, with shape
(n_samples,)
.
- predict_interval(X: Union[_SupportsArray[dtype[Any]], _NestedSequence[_SupportsArray[dtype[Any]]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], aggregate_predictions: Optional[str] = 'mean', minimize_interval_width: bool = False, allow_infinite_bounds: bool = False) Tuple[ndarray[Any, dtype[_ScalarType_co]], ndarray[Any, dtype[_ScalarType_co]]] [source]¶
Predicts points and intervals.
If several confidence levels were provided during initialisation, several intervals will be predicted for each sample. See the return signature.
By default, points are predicted using an aggregation. See the
ensemble
parameter.- Parameters
- XArrayLike
Features
- aggregate_predictionsOptional[str], default=”mean”
The method to predict a point. Options:
None: a point is predicted using the regressor trained on the entire data
“mean”: Averages the predictions of the regressors trained on each cross-validation fold
“median”: Aggregates (using median) the predictions of the regressors trained on each cross-validation fold
- minimize_interval_widthbool, default=False
If True, attempts to minimize the interval width.
- allow_infinite_boundsbool, default=False
If True, allows prediction intervals with infinite bounds.
- Returns
- Tuple[NDArray, NDArray]
Two arrays:
Prediction points, of shape
(n_samples,)
Prediction intervals, of shape
(n_samples, 2, n_confidence_levels)