.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples_regression/4-tutorials/plot_cqr_tutorial.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_examples_regression_4-tutorials_plot_cqr_tutorial.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_regression_4-tutorials_plot_cqr_tutorial.py:


====================================================
Tutorial for conformalized quantile regression (CQR)
====================================================

We will use the sklearn california housing dataset as the base for the
comparison of the different methods available on MAPIE. Two classes will
be used: :class:`~mapie.quantile_regression.MapieQuantileRegressor` for CQR
and :class:`~mapie.regression.MapieRegressor` for the other methods.

For this example, the estimator will be :class:`~lightgbm.LGBMRegressor` with
``objective="quantile"`` as this is a necessary component for CQR, the
regression needs to be from a quantile regressor.

For the conformalized quantile regression (CQR), we will use a split-conformal
method meaning that we will split the training set into a training and
calibration set. This means using
:class:`~mapie.quantile_regression.MapieQuantileRegressor` with ``cv="split"``
and the ``alpha`` parameter already defined. Recall that the ``alpha`` is
`1 - target coverage`.

For the other type of conformal methods, they are chosen with the
parameter ``method`` of :class:`~mapie.regression.MapieRegressor` and the
parameter ``cv`` is the strategy for cross-validation. In this method, to use a
"leave-one-out" strategy, one would have to use ``cv=-1`` where a positive
value would indicate the number of folds for a cross-validation strategy.
Note that for the jackknife+ after boostrap, we need to use the
class :class:`~mapie.subsample.Subsample` (note that the `alpha` parameter is
defined in the ``predict`` for these methods).

.. GENERATED FROM PYTHON SOURCE LINES 31-55

.. code-block:: default


    import warnings

    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    from lightgbm import LGBMRegressor
    from matplotlib.offsetbox import AnnotationBbox, TextArea
    from matplotlib.ticker import FormatStrFormatter
    from scipy.stats import randint, uniform
    from sklearn.datasets import fetch_california_housing
    from sklearn.model_selection import KFold, RandomizedSearchCV, train_test_split

    from mapie.metrics import (regression_coverage_score,
                               regression_mean_width_score)
    from mapie.regression import MapieQuantileRegressor, MapieRegressor
    from mapie.subsample import Subsample

    random_state = 18
    rng = np.random.default_rng(random_state)
    round_to = 3

    warnings.filterwarnings("ignore")


.. GENERATED FROM PYTHON SOURCE LINES 56-65

1. Data
--------------------------------------------------------------------------
The target variable of this dataset is the median house value for the
California districts. This dataset is composed of 8 features, including
variables such as the age of the house, the median income of the
neighborhood, the average numbe rooms or bedrooms or even the location in
latitude and longitude. In total there are around 20k observations.
As the value is expressed in thousands of $ we will multiply it by 100 for
better visualization (note that this will not affect the results).

.. GENERATED FROM PYTHON SOURCE LINES 65-71

.. code-block:: default


    data = fetch_california_housing(as_frame=True)
    X = pd.DataFrame(data=data.data, columns=data.feature_names)
    y = pd.DataFrame(data=data.target) * 100


.. GENERATED FROM PYTHON SOURCE LINES 72-74

Let's visualize the dataset by showing the correlations between the
independent variables.

.. GENERATED FROM PYTHON SOURCE LINES 74-81

.. code-block:: default


    df = pd.concat([X, y], axis=1)
    pear_corr = df.corr(method='pearson')
    pear_corr.style.background_gradient(cmap='Greens', axis=0)


.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style type="text/css">
    #T_d69eb_row0_col0, #T_d69eb_row1_col1, #T_d69eb_row2_col2, #T_d69eb_row3_col3, #T_d69eb_row4_col4, #T_d69eb_row5_col5, #T_d69eb_row6_col6, #T_d69eb_row7_col7, #T_d69eb_row8_col8 {
      background-color: #00441b;
      color: #f1f1f1;
    }
    #T_d69eb_row0_col1 {
      background-color: #e3f4de;
      color: #000000;
    }
    #T_d69eb_row0_col2 {
      background-color: #92d28f;
      color: #000000;
    }
    #T_d69eb_row0_col3 {
      background-color: #f5fbf3;
      color: #000000;
    }
    #T_d69eb_row0_col4 {
      background-color: #cbebc5;
      color: #000000;
    }
    #T_d69eb_row0_col5 {
      background-color: #f1faee;
      color: #000000;
    }
    #T_d69eb_row0_col6 {
      background-color: #8ace88;
      color: #000000;
    }
    #T_d69eb_row0_col7 {
      background-color: #7fc97f;
      color: #000000;
    }
    #T_d69eb_row0_col8 {
      background-color: #289049;
      color: #f1f1f1;
    }
    #T_d69eb_row1_col0, #T_d69eb_row1_col2, #T_d69eb_row1_col3, #T_d69eb_row1_col4, #T_d69eb_row4_col1, #T_d69eb_row6_col7, #T_d69eb_row6_col8, #T_d69eb_row7_col6, #T_d69eb_row8_col5 {
      background-color: #f7fcf5;
      color: #000000;
    }
    #T_d69eb_row1_col5 {
      background-color: #f2faef;
      color: #000000;
    }
    #T_d69eb_row1_col6, #T_d69eb_row3_col7 {
      background-color: #79c67a;
      color: #000000;
    }
    #T_d69eb_row1_col7, #T_d69eb_row4_col6 {
      background-color: #90d18d;
      color: #000000;
    }
    #T_d69eb_row1_col8 {
      background-color: #cfecc9;
      color: #000000;
    }
    #T_d69eb_row2_col0 {
      background-color: #98d594;
      color: #000000;
    }
    #T_d69eb_row2_col1, #T_d69eb_row4_col0 {
      background-color: #e7f6e3;
      color: #000000;
    }
    #T_d69eb_row2_col3 {
      background-color: #05712f;
      color: #f1f1f1;
    }
    #T_d69eb_row2_col4 {
      background-color: #daf0d4;
      color: #000000;
    }
    #T_d69eb_row2_col5, #T_d69eb_row3_col5 {
      background-color: #f5fbf2;
      color: #000000;
    }
    #T_d69eb_row2_col6 {
      background-color: #65bd6f;
      color: #f1f1f1;
    }
    #T_d69eb_row2_col7 {
      background-color: #80ca80;
      color: #000000;
    }
    #T_d69eb_row2_col8 {
      background-color: #c4e8bd;
      color: #000000;
    }
    #T_d69eb_row3_col0 {
      background-color: #f0f9ec;
      color: #000000;
    }
    #T_d69eb_row3_col1 {
      background-color: #dbf1d5;
      color: #000000;
    }
    #T_d69eb_row3_col2 {
      background-color: #016e2d;
      color: #f1f1f1;
    }
    #T_d69eb_row3_col4 {
      background-color: #d9f0d3;
      color: #000000;
    }
    #T_d69eb_row3_col6 {
      background-color: #6dc072;
      color: #000000;
    }
    #T_d69eb_row3_col8, #T_d69eb_row7_col3, #T_d69eb_row7_col8 {
      background-color: #ebf7e7;
      color: #000000;
    }
    #T_d69eb_row4_col2 {
      background-color: #edf8ea;
      color: #000000;
    }
    #T_d69eb_row4_col3 {
      background-color: #f6fcf4;
      color: #000000;
    }
    #T_d69eb_row4_col5, #T_d69eb_row7_col0 {
      background-color: #eaf7e6;
      color: #000000;
    }
    #T_d69eb_row4_col7 {
      background-color: #66bd6f;
      color: #f1f1f1;
    }
    #T_d69eb_row4_col8, #T_d69eb_row5_col8 {
      background-color: #e8f6e4;
      color: #000000;
    }
    #T_d69eb_row5_col0 {
      background-color: #e5f5e1;
      color: #000000;
    }
    #T_d69eb_row5_col1 {
      background-color: #caeac3;
      color: #000000;
    }
    #T_d69eb_row5_col2 {
      background-color: #e5f5e0;
      color: #000000;
    }
    #T_d69eb_row5_col3 {
      background-color: #eef8ea;
      color: #000000;
    }
    #T_d69eb_row5_col4 {
      background-color: #bde5b6;
      color: #000000;
    }
    #T_d69eb_row5_col6, #T_d69eb_row5_col7 {
      background-color: #7ac77b;
      color: #000000;
    }
    #T_d69eb_row6_col0 {
      background-color: #f2faf0;
      color: #000000;
    }
    #T_d69eb_row6_col1 {
      background-color: #cbeac4;
      color: #000000;
    }
    #T_d69eb_row6_col2 {
      background-color: #cdecc7;
      color: #000000;
    }
    #T_d69eb_row6_col3 {
      background-color: #e2f4dd;
      color: #000000;
    }
    #T_d69eb_row6_col4, #T_d69eb_row7_col1 {
      background-color: #e0f3db;
      color: #000000;
    }
    #T_d69eb_row6_col5, #T_d69eb_row7_col5 {
      background-color: #f4fbf1;
      color: #000000;
    }
    #T_d69eb_row7_col2 {
      background-color: #e8f6e3;
      color: #000000;
    }
    #T_d69eb_row7_col4 {
      background-color: #b6e2af;
      color: #000000;
    }
    #T_d69eb_row8_col0 {
      background-color: #2a924a;
      color: #f1f1f1;
    }
    #T_d69eb_row8_col1 {
      background-color: #b5e1ae;
      color: #000000;
    }
    #T_d69eb_row8_col2 {
      background-color: #c3e7bc;
      color: #000000;
    }
    #T_d69eb_row8_col3 {
      background-color: #f3faf0;
      color: #000000;
    }
    #T_d69eb_row8_col4 {
      background-color: #d1edcb;
      color: #000000;
    }
    #T_d69eb_row8_col6 {
      background-color: #97d492;
      color: #000000;
    }
    #T_d69eb_row8_col7 {
      background-color: #84cc83;
      color: #000000;
    }
    </style>
    <table id="T_d69eb_">
      <thead>
        <tr>
          <th class="blank level0" >&nbsp;</th>
          <th class="col_heading level0 col0" >MedInc</th>
          <th class="col_heading level0 col1" >HouseAge</th>
          <th class="col_heading level0 col2" >AveRooms</th>
          <th class="col_heading level0 col3" >AveBedrms</th>
          <th class="col_heading level0 col4" >Population</th>
          <th class="col_heading level0 col5" >AveOccup</th>
          <th class="col_heading level0 col6" >Latitude</th>
          <th class="col_heading level0 col7" >Longitude</th>
          <th class="col_heading level0 col8" >MedHouseVal</th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th id="T_d69eb_level0_row0" class="row_heading level0 row0" >MedInc</th>
          <td id="T_d69eb_row0_col0" class="data row0 col0" >1.000000</td>
          <td id="T_d69eb_row0_col1" class="data row0 col1" >-0.119034</td>
          <td id="T_d69eb_row0_col2" class="data row0 col2" >0.326895</td>
          <td id="T_d69eb_row0_col3" class="data row0 col3" >-0.062040</td>
          <td id="T_d69eb_row0_col4" class="data row0 col4" >0.004834</td>
          <td id="T_d69eb_row0_col5" class="data row0 col5" >0.018766</td>
          <td id="T_d69eb_row0_col6" class="data row0 col6" >-0.079809</td>
          <td id="T_d69eb_row0_col7" class="data row0 col7" >-0.015176</td>
          <td id="T_d69eb_row0_col8" class="data row0 col8" >0.688075</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row1" class="row_heading level0 row1" >HouseAge</th>
          <td id="T_d69eb_row1_col0" class="data row1 col0" >-0.119034</td>
          <td id="T_d69eb_row1_col1" class="data row1 col1" >1.000000</td>
          <td id="T_d69eb_row1_col2" class="data row1 col2" >-0.153277</td>
          <td id="T_d69eb_row1_col3" class="data row1 col3" >-0.077747</td>
          <td id="T_d69eb_row1_col4" class="data row1 col4" >-0.296244</td>
          <td id="T_d69eb_row1_col5" class="data row1 col5" >0.013191</td>
          <td id="T_d69eb_row1_col6" class="data row1 col6" >0.011173</td>
          <td id="T_d69eb_row1_col7" class="data row1 col7" >-0.108197</td>
          <td id="T_d69eb_row1_col8" class="data row1 col8" >0.105623</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row2" class="row_heading level0 row2" >AveRooms</th>
          <td id="T_d69eb_row2_col0" class="data row2 col0" >0.326895</td>
          <td id="T_d69eb_row2_col1" class="data row2 col1" >-0.153277</td>
          <td id="T_d69eb_row2_col2" class="data row2 col2" >1.000000</td>
          <td id="T_d69eb_row2_col3" class="data row2 col3" >0.847621</td>
          <td id="T_d69eb_row2_col4" class="data row2 col4" >-0.072213</td>
          <td id="T_d69eb_row2_col5" class="data row2 col5" >-0.004852</td>
          <td id="T_d69eb_row2_col6" class="data row2 col6" >0.106389</td>
          <td id="T_d69eb_row2_col7" class="data row2 col7" >-0.027540</td>
          <td id="T_d69eb_row2_col8" class="data row2 col8" >0.151948</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row3" class="row_heading level0 row3" >AveBedrms</th>
          <td id="T_d69eb_row3_col0" class="data row3 col0" >-0.062040</td>
          <td id="T_d69eb_row3_col1" class="data row3 col1" >-0.077747</td>
          <td id="T_d69eb_row3_col2" class="data row3 col2" >0.847621</td>
          <td id="T_d69eb_row3_col3" class="data row3 col3" >1.000000</td>
          <td id="T_d69eb_row3_col4" class="data row3 col4" >-0.066197</td>
          <td id="T_d69eb_row3_col5" class="data row3 col5" >-0.006181</td>
          <td id="T_d69eb_row3_col6" class="data row3 col6" >0.069721</td>
          <td id="T_d69eb_row3_col7" class="data row3 col7" >0.013344</td>
          <td id="T_d69eb_row3_col8" class="data row3 col8" >-0.046701</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row4" class="row_heading level0 row4" >Population</th>
          <td id="T_d69eb_row4_col0" class="data row4 col0" >0.004834</td>
          <td id="T_d69eb_row4_col1" class="data row4 col1" >-0.296244</td>
          <td id="T_d69eb_row4_col2" class="data row4 col2" >-0.072213</td>
          <td id="T_d69eb_row4_col3" class="data row4 col3" >-0.066197</td>
          <td id="T_d69eb_row4_col4" class="data row4 col4" >1.000000</td>
          <td id="T_d69eb_row4_col5" class="data row4 col5" >0.069863</td>
          <td id="T_d69eb_row4_col6" class="data row4 col6" >-0.108785</td>
          <td id="T_d69eb_row4_col7" class="data row4 col7" >0.099773</td>
          <td id="T_d69eb_row4_col8" class="data row4 col8" >-0.024650</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row5" class="row_heading level0 row5" >AveOccup</th>
          <td id="T_d69eb_row5_col0" class="data row5 col0" >0.018766</td>
          <td id="T_d69eb_row5_col1" class="data row5 col1" >0.013191</td>
          <td id="T_d69eb_row5_col2" class="data row5 col2" >-0.004852</td>
          <td id="T_d69eb_row5_col3" class="data row5 col3" >-0.006181</td>
          <td id="T_d69eb_row5_col4" class="data row5 col4" >0.069863</td>
          <td id="T_d69eb_row5_col5" class="data row5 col5" >1.000000</td>
          <td id="T_d69eb_row5_col6" class="data row5 col6" >0.002366</td>
          <td id="T_d69eb_row5_col7" class="data row5 col7" >0.002476</td>
          <td id="T_d69eb_row5_col8" class="data row5 col8" >-0.023737</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row6" class="row_heading level0 row6" >Latitude</th>
          <td id="T_d69eb_row6_col0" class="data row6 col0" >-0.079809</td>
          <td id="T_d69eb_row6_col1" class="data row6 col1" >0.011173</td>
          <td id="T_d69eb_row6_col2" class="data row6 col2" >0.106389</td>
          <td id="T_d69eb_row6_col3" class="data row6 col3" >0.069721</td>
          <td id="T_d69eb_row6_col4" class="data row6 col4" >-0.108785</td>
          <td id="T_d69eb_row6_col5" class="data row6 col5" >0.002366</td>
          <td id="T_d69eb_row6_col6" class="data row6 col6" >1.000000</td>
          <td id="T_d69eb_row6_col7" class="data row6 col7" >-0.924664</td>
          <td id="T_d69eb_row6_col8" class="data row6 col8" >-0.144160</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row7" class="row_heading level0 row7" >Longitude</th>
          <td id="T_d69eb_row7_col0" class="data row7 col0" >-0.015176</td>
          <td id="T_d69eb_row7_col1" class="data row7 col1" >-0.108197</td>
          <td id="T_d69eb_row7_col2" class="data row7 col2" >-0.027540</td>
          <td id="T_d69eb_row7_col3" class="data row7 col3" >0.013344</td>
          <td id="T_d69eb_row7_col4" class="data row7 col4" >0.099773</td>
          <td id="T_d69eb_row7_col5" class="data row7 col5" >0.002476</td>
          <td id="T_d69eb_row7_col6" class="data row7 col6" >-0.924664</td>
          <td id="T_d69eb_row7_col7" class="data row7 col7" >1.000000</td>
          <td id="T_d69eb_row7_col8" class="data row7 col8" >-0.045967</td>
        </tr>
        <tr>
          <th id="T_d69eb_level0_row8" class="row_heading level0 row8" >MedHouseVal</th>
          <td id="T_d69eb_row8_col0" class="data row8 col0" >0.688075</td>
          <td id="T_d69eb_row8_col1" class="data row8 col1" >0.105623</td>
          <td id="T_d69eb_row8_col2" class="data row8 col2" >0.151948</td>
          <td id="T_d69eb_row8_col3" class="data row8 col3" >-0.046701</td>
          <td id="T_d69eb_row8_col4" class="data row8 col4" >-0.024650</td>
          <td id="T_d69eb_row8_col5" class="data row8 col5" >-0.023737</td>
          <td id="T_d69eb_row8_col6" class="data row8 col6" >-0.144160</td>
          <td id="T_d69eb_row8_col7" class="data row8 col7" >-0.045967</td>
          <td id="T_d69eb_row8_col8" class="data row8 col8" >1.000000</td>
        </tr>
      </tbody>
    </table>

    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 82-83

Now let's visualize a histogram of the price of the houses.

.. GENERATED FROM PYTHON SOURCE LINES 83-93

.. code-block:: default


    fig, axs = plt.subplots(1, 1, figsize=(5, 5))
    axs.hist(y, bins=50)
    axs.set_xlabel("Median price of houses")
    axs.set_title("Histogram of house prices")
    axs.xaxis.set_major_formatter(FormatStrFormatter('%.0f' + "k"))
    plt.show()


.. image-sg:: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_001.png
   :alt: Histogram of house prices
   :srcset: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_001.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 94-97

Let's now create the different splits for the dataset, with a training,
calibration and test set. Recall that the calibration set is used for
calibrating the prediction intervals.

.. GENERATED FROM PYTHON SOURCE LINES 97-106

.. code-block:: default


    X_train, X_test, y_train, y_test = train_test_split(
        X,
        y['MedHouseVal'],
        random_state=random_state
    )


.. GENERATED FROM PYTHON SOURCE LINES 107-114

2. Optimizing estimator
--------------------------------------------------------------------------
Before estimating uncertainties, let's start by optimizing the base model
in order to reduce our prediction error. We will use the
:class:`~lightgbm.LGBMRegressor` in the quantile setting. The optimization
is performed using :class:`~sklearn.model_selection.RandomizedSearchCV`
to find the optimal model to predict the house prices.

.. GENERATED FROM PYTHON SOURCE LINES 114-140

.. code-block:: default


    estimator = LGBMRegressor(
        objective='quantile',
        alpha=0.5,
        random_state=random_state,
        verbose=-1
    )
    params_distributions = dict(
        num_leaves=randint(low=10, high=50),
        max_depth=randint(low=3, high=20),
        n_estimators=randint(low=50, high=100),
        learning_rate=uniform()
    )
    optim_model = RandomizedSearchCV(
        estimator,
        param_distributions=params_distributions,
        n_jobs=-1,
        n_iter=10,
        cv=KFold(n_splits=5, shuffle=True),
        random_state=random_state
    )
    optim_model.fit(X_train, y_train)
    estimator = optim_model.best_estimator_


.. GENERATED FROM PYTHON SOURCE LINES 141-152

3. Comparison of MAPIE methods
--------------------------------------------------------------------------
We will now proceed to compare the different methods available in MAPIE used
for uncertainty quantification on regression settings. For this tutorial we
will compare the "naive", "Jackknife plus after Bootstrap", "cv plus" and
"conformalized quantile regression". Please have a look at the theoretical
description of the documentation for more details on these methods.

We also create two functions, one to sort the dataset in increasing values
of ``y_test`` and a plotting function, so that we can plot all predictions
and prediction intervals for different conformal methods.

.. GENERATED FROM PYTHON SOURCE LINES 152-231

.. code-block:: default


    def sort_y_values(y_test, y_pred, y_pis):
        """
        Sorting the dataset in order to make plots using the fill_between function.
        """
        indices = np.argsort(y_test)
        y_test_sorted = np.array(y_test)[indices]
        y_pred_sorted = y_pred[indices]
        y_lower_bound = y_pis[:, 0, 0][indices]
        y_upper_bound = y_pis[:, 1, 0][indices]
        return y_test_sorted, y_pred_sorted, y_lower_bound, y_upper_bound


    def plot_prediction_intervals(
        title,
        axs,
        y_test_sorted,
        y_pred_sorted,
        lower_bound,
        upper_bound,
        coverage,
        width,
        num_plots_idx
    ):
        """
        Plot of the prediction intervals for each different conformal
        method.
        """
        axs.yaxis.set_major_formatter(FormatStrFormatter('%.0f' + "k"))
        axs.xaxis.set_major_formatter(FormatStrFormatter('%.0f' + "k"))

        lower_bound_ = np.take(lower_bound, num_plots_idx)
        y_pred_sorted_ = np.take(y_pred_sorted, num_plots_idx)
        y_test_sorted_ = np.take(y_test_sorted, num_plots_idx)

        error = y_pred_sorted_-lower_bound_

        warning1 = y_test_sorted_ > y_pred_sorted_+error
        warning2 = y_test_sorted_ < y_pred_sorted_-error
        warnings = warning1 + warning2
        axs.errorbar(
            y_test_sorted_[~warnings],
            y_pred_sorted_[~warnings],
            yerr=np.abs(error[~warnings]),
            capsize=5, marker="o", elinewidth=2, linewidth=0,
            label="Inside prediction interval"
            )
        axs.errorbar(
            y_test_sorted_[warnings],
            y_pred_sorted_[warnings],
            yerr=np.abs(error[warnings]),
            capsize=5, marker="o", elinewidth=2, linewidth=0, color="red",
            label="Outside prediction interval"
            )
        axs.scatter(
            y_test_sorted_[warnings],
            y_test_sorted_[warnings],
            marker="*", color="green",
            label="True value"
        )
        axs.set_xlabel("True house prices in $")
        axs.set_ylabel("Prediction of house prices in $")
        ab = AnnotationBbox(
            TextArea(
                f"Coverage: {np.round(coverage, round_to)}\n"
                + f"Interval width: {np.round(width, round_to)}"
            ),
            xy=(np.min(y_test_sorted_)*3, np.max(y_pred_sorted_+error)*0.95),
            )
        lims = [
            np.min([axs.get_xlim(), axs.get_ylim()]),  # min of both axes
            np.max([axs.get_xlim(), axs.get_ylim()]),  # max of both axes
        ]
        axs.plot(lims, lims, '--', alpha=0.75, color="black", label="x=y")
        axs.add_artist(ab)
        axs.set_title(title, fontweight='bold')


.. GENERATED FROM PYTHON SOURCE LINES 232-250

We proceed to using MAPIE to return the predictions and prediction intervals.
We will use an ``α=0.2``, this means a target coverage of 0.8
(recall that this parameter needs to be initialized directly when setting
:class:`~mapie.quantile_regression.MapieQuantileRegressor` and when using
:class:`~mapie.regression.MapieRegressor`, it needs to be set in the
``predict``).
Note that for the CQR, there are two options for ``cv``:

* ``cv="split"`` (by default), the split-conformal where MAPIE trains the
  model on a training set and then calibrates on the calibration set.
* ``cv="prefit"`` meaning that you can train your models with the correct
  quantile values (must be given in the following order:
  ``(α, 1-(α/2), 0.5)`` and given to MAPIE as an iterable
  object. (Check the examples for how to use prefit in MAPIE)

Additionally, note that there is a list of accepted models by
:class:`~mapie.quantile_regression.MapieQuantileRegressor`
(``quantile_estimator_params``) and that we will use symmetrical residuals.

.. GENERATED FROM PYTHON SOURCE LINES 250-297

.. code-block:: default


    STRATEGIES = {
        "naive": {"method": "naive"},
        "cv_plus": {"method": "plus", "cv": 10},
        "jackknife_plus_ab": {"method": "plus", "cv": Subsample(n_resamplings=50)},
        "cqr": {"method": "quantile", "cv": "split", "alpha": 0.2},
    }
    y_pred, y_pis = {}, {}
    y_test_sorted, y_pred_sorted, lower_bound, upper_bound = {}, {}, {}, {}
    coverage, width = {}, {}
    for strategy, params in STRATEGIES.items():
        if strategy == "cqr":
            mapie = MapieQuantileRegressor(estimator, **params)
            mapie.fit(
                X_train,
                y_train,
                calib_size=0.3,
                random_state=random_state
            )
            y_pred[strategy], y_pis[strategy] = mapie.predict(X_test)
        else:
            mapie = MapieRegressor(
                estimator,
                test_size=0.3,
                random_state=random_state,
                **params
            )
            mapie.fit(X_train, y_train)
            y_pred[strategy], y_pis[strategy] = mapie.predict(X_test, alpha=0.2)
        (
            y_test_sorted[strategy],
            y_pred_sorted[strategy],
            lower_bound[strategy],
            upper_bound[strategy]
        ) = sort_y_values(y_test, y_pred[strategy], y_pis[strategy])
        coverage[strategy] = regression_coverage_score(
            y_test,
            y_pis[strategy][:, 0, 0],
            y_pis[strategy][:, 1, 0]
            )
        width[strategy] = regression_mean_width_score(
            y_pis[strategy][:, 0, 0],
            y_pis[strategy][:, 1, 0]
            )


.. GENERATED FROM PYTHON SOURCE LINES 298-300

We will now proceed to the plotting stage, note that we only plot 2% of the
observations in order to not crowd the plot too much.

.. GENERATED FROM PYTHON SOURCE LINES 300-333

.. code-block:: default


    perc_obs_plot = 0.02
    num_plots = rng.choice(
        len(y_test), int(perc_obs_plot*len(y_test)), replace=False
        )
    fig, axs = plt.subplots(2, 2, figsize=(15, 13))
    coords = [axs[0, 0], axs[0, 1], axs[1, 0], axs[1, 1]]
    for strategy, coord in zip(STRATEGIES.keys(), coords):
        plot_prediction_intervals(
            strategy,
            coord,
            y_test_sorted[strategy],
            y_pred_sorted[strategy],
            lower_bound[strategy],
            upper_bound[strategy],
            coverage[strategy],
            width[strategy],
            num_plots
            )
    lines_labels = [ax.get_legend_handles_labels() for ax in fig.axes]
    lines, labels = [sum(_, []) for _ in zip(*lines_labels)]
    plt.legend(
        lines[:4], labels[:4],
        loc='upper center',
        bbox_to_anchor=(0, -0.15),
        fancybox=True,
        shadow=True,
        ncol=2
    )
    plt.show()


.. image-sg:: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_002.png
   :alt: naive, cv_plus, jackknife_plus_ab, cqr
   :srcset: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_002.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 334-338

We notice more adaptability of the prediction intervals for the
conformalized quantile regression while the other methods have fixed
interval width. Indeed, as the prices get larger, the prediction intervals
are increased with the increase in price.

.. GENERATED FROM PYTHON SOURCE LINES 338-398

.. code-block:: default


    def get_coverages_widths_by_bins(
        want,
        y_test,
        y_pred,
        lower_bound,
        upper_bound,
        STRATEGIES,
        bins
    ):
        """
        Given the results from MAPIE, this function split the data
        according the the test values into bins and calculates coverage
        or width per bin.
        """
        cuts = []
        cuts_ = pd.qcut(y_test["naive"], bins).unique()[:-1]
        for item in cuts_:
            cuts.append(item.left)
        cuts.append(cuts_[-1].right)
        cuts.append(np.max(y_test["naive"])+1)
        recap = {}
        for i in range(len(cuts) - 1):
            cut1, cut2 = cuts[i], cuts[i+1]
            name = f"[{np.round(cut1, 0)}, {np.round(cut2, 0)}]"
            recap[name] = []
            for strategy in STRATEGIES:
                indices = np.where(
                    (y_test[strategy] > cut1) * (y_test[strategy] <= cut2)
                    )
                y_test_trunc = np.take(y_test[strategy], indices)
                y_low_ = np.take(lower_bound[strategy], indices)
                y_high_ = np.take(upper_bound[strategy], indices)
                if want == "coverage":
                    recap[name].append(regression_coverage_score(
                        y_test_trunc[0],
                        y_low_[0],
                        y_high_[0]
                    ))
                elif want == "width":
                    recap[name].append(
                        regression_mean_width_score(y_low_[0], y_high_[0])
                    )
        recap_df = pd.DataFrame(recap, index=STRATEGIES)
        return recap_df


    bins = list(np.arange(0, 1, 0.1))
    binned_data = get_coverages_widths_by_bins(
        "coverage",
        y_test_sorted,
        y_pred_sorted,
        lower_bound,
        upper_bound,
        STRATEGIES,
        bins
    )


.. GENERATED FROM PYTHON SOURCE LINES 399-402

To confirm these insights, we will now observe what happens when we plot
the conditional coverage and interval width on these intervals splitted by
quantiles.

.. GENERATED FROM PYTHON SOURCE LINES 402-414

.. code-block:: default


    binned_data.T.plot.bar(figsize=(12, 4))
    plt.axhline(0.80, ls="--", color="k")
    plt.ylabel("Conditional coverage")
    plt.xlabel("Binned house prices")
    plt.xticks(rotation=345)
    plt.ylim(0.3, 1.0)
    plt.legend(loc=[1, 0])
    plt.show()


.. image-sg:: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_003.png
   :alt: plot cqr tutorial
   :srcset: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_003.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 415-422

What we observe from these results is that none of the methods seems to
have conditional coverage at the target ``1 - α``. However, we can
clearly notice that the CQR seems to better adapt to large prices. Its
conditional coverage is closer to the target coverage not only for higher
prices, but also for lower prices where the other methods have a higher
coverage than needed. This will very likely have an impact on the widths
of the intervals.

.. GENERATED FROM PYTHON SOURCE LINES 422-443

.. code-block:: default


    binned_data = get_coverages_widths_by_bins(
        "width",
        y_test_sorted,
        y_pred_sorted,
        lower_bound,
        upper_bound,
        STRATEGIES,
        bins
    )


    binned_data.T.plot.bar(figsize=(12, 4))
    plt.ylabel("Interval width")
    plt.xlabel("Binned house prices")
    plt.xticks(rotation=350)
    plt.legend(loc=[1, 0])
    plt.show()


.. image-sg:: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_004.png
   :alt: plot cqr tutorial
   :srcset: /examples_regression/4-tutorials/images/sphx_glr_plot_cqr_tutorial_004.png
   :class: sphx-glr-single-img


.. GENERATED FROM PYTHON SOURCE LINES 444-449

When observing the values of the the interval width we again see what was
observed in the previous graphs with the interval widths. We can again see
that the prediction intervals are larger as the price of the houses
increases, interestingly, it's important to note that the prediction
intervals are shorter when the estimator is more certain.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  18.778 seconds)


.. _sphx_glr_download_examples_regression_4-tutorials_plot_cqr_tutorial.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_cqr_tutorial.py <plot_cqr_tutorial.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_cqr_tutorial.ipynb <plot_cqr_tutorial.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_