.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples_regression/1-quickstart/plot_timeseries_example.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_examples_regression_1-quickstart_plot_timeseries_example.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_regression_1-quickstart_plot_timeseries_example.py:


=======================================================
Estimating prediction intervals of time series forecast
=======================================================
This example uses :class:`~mapie.regression.MapieRegressor` to estimate
prediction intervals associated with time series forecast. We use the
standard cross-validation approach to estimate conformity scores and associated
prediction intervals.

We use here the Victoria electricity demand dataset used in the book
"Forecasting: Principles and Practice" by R. J. Hyndman and G. Athanasopoulos.
The electricity demand features daily and weekly seasonalities and is impacted
by the temperature, considered here as a exogeneous variable.

The data is modelled by a Random Forest model with a
:class:`~sklearn.model_selection.RandomizedSearchCV` using a sequential
:class:`~sklearn.model_selection.TimeSeriesSplit` cross validation, in which
the training set is prior to the validation set.
The best model is then feeded into :class:`~mapie.regression.MapieRegressor`
to estimate the associated prediction intervals.
We consider the standard CV+ resampling method.

We wish to emphasize one main limitation with this example.
We use a standard cross-validation in Mapie to estimate the prediction
intervals, through the `sklearn.model_selection.KFold()` object.
Residuals are therefore estimated using models trained on data with higher
indices than the validation data, which is inappropriate for time-series data.
However, using a `sklearn.model_selection.TimeSeriesSplit` cross validation
object for estimating the residuals breaks the theoretical guarantees of the
Jackknife+ and CV+ methods.

.. GENERATED FROM PYTHON SOURCE LINES 32-115


.. image-sg:: /examples_regression/1-quickstart/images/sphx_glr_plot_timeseries_example_001.png
   :alt: plot timeseries example
   :srcset: /examples_regression/1-quickstart/images/sphx_glr_plot_timeseries_example_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 Out:

 .. code-block:: none

    Coverage and prediction interval width mean for CV+: 0.812, 0.978


|

.. code-block:: default

    import pandas as pd
    from matplotlib import pylab as plt
    from scipy.stats import randint
    from sklearn.ensemble import RandomForestRegressor
    from sklearn.model_selection import RandomizedSearchCV, TimeSeriesSplit

    from mapie.metrics import (regression_coverage_score,
                               regression_mean_width_score)
    from mapie.regression import MapieRegressor

    # Load input data and feature engineering
    demand_df = pd.read_csv(
        "../../data/demand_temperature.csv", parse_dates=True, index_col=0
    )
    demand_df["Date"] = pd.to_datetime(demand_df.index)
    demand_df["Weekofyear"] = demand_df.Date.dt.isocalendar().week.astype("int64")
    demand_df["Weekday"] = demand_df.Date.dt.isocalendar().day.astype("int64")
    demand_df["Hour"] = demand_df.index.hour

    # Train/validation/test split
    num_test_steps = 24 * 7 * 2
    demand_train = demand_df.iloc[:-num_test_steps, :].copy()
    demand_test = demand_df.iloc[-num_test_steps:, :].copy()
    X_train = demand_train.loc[:, ["Weekofyear", "Weekday", "Hour", "Temperature"]]
    y_train = demand_train["Demand"]
    X_test = demand_test.loc[:, ["Weekofyear", "Weekday", "Hour", "Temperature"]]
    y_test = demand_test["Demand"]

    # CV parameter search
    n_iter = 10
    n_splits = 5
    tscv = TimeSeriesSplit(n_splits=n_splits)
    random_state = 59
    rf_model = RandomForestRegressor(random_state=random_state)
    rf_params = {"max_depth": randint(2, 30), "n_estimators": randint(10, 1e3)}
    cv_obj = RandomizedSearchCV(
        rf_model,
        param_distributions=rf_params,
        n_iter=n_iter,
        cv=tscv,
        scoring="neg_root_mean_squared_error",
        random_state=random_state,
        verbose=0,
        n_jobs=-1,
    )
    cv_obj.fit(X_train, y_train)
    best_est = cv_obj.best_estimator_

    # Estimate prediction intervals on test set with best estimator
    # Here, a non-nested CV approach is used for the sake of computational
    # time, but a nested CV approach is preferred.
    # See the dedicated example in the gallery for more information.
    alpha = 0.1
    mapie = MapieRegressor(
        best_est, method="plus", cv=n_splits, agg_function="median", n_jobs=-1
    )
    mapie.fit(X_train, y_train)
    y_pred, y_pis = mapie.predict(X_test, alpha=alpha)
    coverage = regression_coverage_score(y_test, y_pis[:, 0, 0], y_pis[:, 1, 0])
    width = regression_mean_width_score(y_pis[:, 0, 0], y_pis[:, 1, 0])

    # Print results
    print(
        "Coverage and prediction interval width mean for CV+: "
        f"{coverage:.3f}, {width:.3f}"
    )

    # Plot estimated prediction intervals on test set
    fig = plt.figure(figsize=(15, 5))
    ax = fig.add_subplot(1, 1, 1)
    ax.set_ylabel("Hourly demand (GW)")
    ax.plot(demand_test.Demand, lw=2, label="Test data", c="C1")
    ax.plot(demand_test.index, y_pred, lw=2, c="C2", label="Predictions")
    ax.fill_between(
        demand_test.index,
        y_pis[:, 0, 0],
        y_pis[:, 1, 0],
        color="C2",
        alpha=0.2,
        label="CV+ PIs",
    )
    ax.legend()
    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  41.073 seconds)


.. _sphx_glr_download_examples_regression_1-quickstart_plot_timeseries_example.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example


  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_timeseries_example.py <plot_timeseries_example.py>`


  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_timeseries_example.ipynb <plot_timeseries_example.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_