.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "examples_regression/1-quickstart/plot_prefit.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note Click :ref:`here ` to download the full example code .. rst-class:: sphx-glr-example-title .. _sphx_glr_examples_regression_1-quickstart_plot_prefit.py: =========================================================================== Example use of the prefit parameter with neural networks and LGBM Regressor =========================================================================== :class:`~mapie.regression.MapieRegressor` and :class:`~mapie.quantile_regression.MapieQuantileRegressor` are used to calibrate uncertainties for large models for which the cost of cross-validation is too high. Typically, neural networks rely on a single validation set. In this example, we first fit a neural network on the training set. We then compute residuals on a validation set with the `cv="prefit"` parameter. Finally, we evaluate the model with prediction intervals on a testing set. We will also show how to use the prefit method in the conformalized quantile regressor. .. GENERATED FROM PYTHON SOURCE LINES 18-37 .. code-block:: default import warnings import numpy as np import scipy from lightgbm import LGBMRegressor from matplotlib import pyplot as plt from sklearn.model_selection import train_test_split from sklearn.neural_network import MLPRegressor from mapie._typing import NDArray from mapie.metrics import regression_coverage_score from mapie.regression import MapieQuantileRegressor, MapieRegressor warnings.filterwarnings("ignore") alpha = 0.1 .. GENERATED FROM PYTHON SOURCE LINES 38-44 1. Generate dataset ----------------------------------------------------------------------------- We start by defining a function that we will use to generate data. We then add random noise to the y values. Then we split the dataset to have a training, calibration and test set. .. GENERATED FROM PYTHON SOURCE LINES 44-66 .. code-block:: default def f(x: NDArray) -> NDArray: """Polynomial function used to generate one-dimensional data.""" return np.array(5 * x + 5 * x**4 - 9 * x**2) # Generate data sigma = 0.1 n_samples = 10000 X = np.linspace(0, 1, n_samples) y = f(X) + np.random.normal(0, sigma, n_samples) # Train/validation/test split X_train_cal, X_test, y_train_cal, y_test = train_test_split( X, y, test_size=1 / 10 ) X_train, X_cal, y_train, y_cal = train_test_split( X_train_cal, y_train_cal, test_size=1 / 9 ) .. GENERATED FROM PYTHON SOURCE LINES 67-78 2. Pre-train models ----------------------------------------------------------------------------- For this example, we will train a :class:`~sklearn.neural_network.MLPRegressor` for :class:`~mapie.regression.MapieRegressor` and multiple LGBMRegressor with a quantile objective as this is a requirement to perform conformalized quantile regression using :class:`~mapie.quanitle_regression.MapieQuantileRegressor`. Note that the three estimators need to be trained at quantile values of ``(α/2, 1-(α/2), 0.5)``. .. GENERATED FROM PYTHON SOURCE LINES 78-95 .. code-block:: default # Train a MLPRegressor for MapieRegressor est_mlp = MLPRegressor(activation="relu", random_state=1) est_mlp.fit(X_train.reshape(-1, 1), y_train) # Train LGBMRegressor models for MapieQuantileRegressor list_estimators_cqr = [] for alpha_ in [alpha / 2, (1 - (alpha / 2)), 0.5]: estimator_ = LGBMRegressor( objective='quantile', alpha=alpha_, ) estimator_.fit(X_train.reshape(-1, 1), y_train) list_estimators_cqr.append(estimator_) .. GENERATED FROM PYTHON SOURCE LINES 96-102 3. Using MAPIE to calibrate the models ----------------------------------------------------------------------------- We will now proceed to calibrate the models using MAPIE. To this aim, we set `cv="prefit"` so that we use the models that we already trained prior. We then precict using the test set and evaluate its coverage. .. GENERATED FROM PYTHON SOURCE LINES 102-125 .. code-block:: default # Calibrate uncertainties on calibration set mapie = MapieRegressor(est_mlp, cv="prefit") mapie.fit(X_cal.reshape(-1, 1), y_cal) # Evaluate prediction and coverage level on testing set y_pred, y_pis = mapie.predict(X_test.reshape(-1, 1), alpha=alpha) coverage = regression_coverage_score(y_test, y_pis[:, 0, 0], y_pis[:, 1, 0]) # Calibrate uncertainties on calibration set mapie_cqr = MapieQuantileRegressor(list_estimators_cqr, cv="prefit") mapie_cqr.fit(X_cal.reshape(-1, 1), y_cal) # Evaluate prediction and coverage level on testing set y_pred_cqr, y_pis_cqr = mapie_cqr.predict(X_test.reshape(-1, 1)) coverage_cqr = regression_coverage_score( y_test, y_pis_cqr[:, 0, 0], y_pis_cqr[:, 1, 0] ) .. GENERATED FROM PYTHON SOURCE LINES 126-133 4. Plots ----------------------------------------------------------------------------- In order to view the results shown above, we will plot each other predictions with their prediction interval. The multi-layer perceptron (MLP) with :class:`~mapie.regression.MapieRegressor` and LGBMRegressor with :class:`~mapie.quantile_regression.MapieQuantileRegressor`. .. GENERATED FROM PYTHON SOURCE LINES 133-204 .. code-block:: default # Plot obtained prediction intervals on testing set theoretical_semi_width = scipy.stats.norm.ppf(1 - alpha) * sigma y_test_theoretical = f(X_test) order = np.argsort(X_test) plt.figure(figsize=(8, 8)) plt.plot( X_test[order], y_pred[order], label="Predictions MLP", color="green" ) plt.fill_between( X_test[order], y_pis[:, 0, 0][order], y_pis[:, 1, 0][order], alpha=0.4, label="prediction intervals MP", color="green" ) plt.plot( X_test[order], y_pred_cqr[order], label="Predictions LGBM", color="blue" ) plt.fill_between( X_test[order], y_pis_cqr[:, 0, 0][order], y_pis_cqr[:, 1, 0][order], alpha=0.4, label="prediction intervals MQP", color="blue" ) plt.title( f"Target and effective coverages for:\n " f"MLP with MapieRegressor alpha={alpha}: " + f"({1 - alpha:.3f}, {coverage:.3f})\n" f"LGBM with MapieQuantileRegressor alpha={alpha}: " + f"({1 - alpha:.3f}, {coverage_cqr:.3f})" ) plt.scatter(X_test, y_test, color="red", alpha=0.7, label="testing", s=2) plt.plot( X_test[order], y_test_theoretical[order], color="gray", label="True confidence intervals", ) plt.plot( X_test[order], y_test_theoretical[order] - theoretical_semi_width, color="gray", ls="--", ) plt.plot( X_test[order], y_test_theoretical[order] + theoretical_semi_width, color="gray", ls="--", ) plt.xlabel("x") plt.ylabel("y") plt.legend( loc='upper center', bbox_to_anchor=(0.5, -0.05), fancybox=True, shadow=True, ncol=3 ) plt.show() .. image-sg:: /examples_regression/1-quickstart/images/sphx_glr_plot_prefit_001.png :alt: Target and effective coverages for: MLP with MapieRegressor alpha=0.1: (0.900, 0.920) LGBM with MapieQuantileRegressor alpha=0.1: (0.900, 0.906) :srcset: /examples_regression/1-quickstart/images/sphx_glr_plot_prefit_001.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** ( 0 minutes 0.928 seconds) .. _sphx_glr_download_examples_regression_1-quickstart_plot_prefit.py: .. only :: html .. container:: sphx-glr-footer :class: sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: plot_prefit.py ` .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: plot_prefit.ipynb ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_