.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "examples_exchangeability_testing/1-quickstart/plot_risk_monitoring.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_examples_exchangeability_testing_1-quickstart_plot_risk_monitoring.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_examples_exchangeability_testing_1-quickstart_plot_risk_monitoring.py:


Detect harmful shifts with RiskMonitoring
=========================================

This quickstart demonstrates how to use `RiskMonitoring` to track the
misclassification risk of a deployed binary classifier on an online stream.

The workflow is simple:

1. estimate an acceptable risk level (the *monitoring threshold*) on clean
   reference data,
2. update the online risk as new labeled observations become available,
3. flag a *harmful shift* once the lower confidence bound on the online
   risk exceeds the threshold.

Instead of directly comparing risk estimates, computing confidence bounds
provides statistical guarantees that account for evaluation uncertainty [1].

A limitation of the current approach is that (at least some) labeled data
is necessary to update the online risk estimate. For scenarios with scarce
or no labels, please refer to the extensions [2] and [3] respectively.

References
----------
- [1] Aleksandr Podkopaev and Aaditya Ramdas. Tracking the risk of a
  deployed model and detecting harmful distribution shifts.
  International Conference on Learning Representations, 2022.
- [2] Zhang et al. Prediction-Powered Risk Monitoring of Deployed Models
  for Detecting Harmful Distribution Shifts. arXiv:2602.02229, 2026.
- [3] Amoukou et al. Sequential harmful shift detection without labels.
  Advances in Neural Information Processing Systems, 2024.

.. GENERATED FROM PYTHON SOURCE LINES 35-41

Estimate the monitoring threshold
---------------------------------

We first fit a classifier on clean training data, then estimate the
monitoring threshold on a clean reference set. `risk="accuracy"` means
that `RiskMonitoring` tracks the misclassification risk `1 - accuracy`.

.. GENERATED FROM PYTHON SOURCE LINES 41-73

.. code-block:: Python


    from sklearn.linear_model import LogisticRegression

    from mapie._example_utils import generate_gaussian_stream, plot_monitoring_results
    from mapie.exchangeability_testing import RiskMonitoring

    random_state = 42
    batch_size = 25
    prop_shift = 0.5

    X_train, y_train = generate_gaussian_stream(
        shift_type="stable",
        random_state=random_state,
    )
    X_reference, y_reference = generate_gaussian_stream(
        shift_type="stable",
        random_state=random_state + 1,
    )

    clf = LogisticRegression(random_state=random_state)
    clf.fit(X_train, y_train)

    monitor_no_shift = RiskMonitoring(risk="accuracy")
    monitor_no_shift.compute_threshold(y_reference, clf.predict(X_reference))
    threshold = monitor_no_shift.threshold

    print(
        "Reference upper bound on the misclassification risk: "
        f"{monitor_no_shift.reference_risk_upper_bound:.3f}"
    )
    print(f"Monitoring threshold: {threshold:.3f}")


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Reference upper bound on the misclassification risk: 0.080
    Monitoring threshold: 0.130


.. GENERATED FROM PYTHON SOURCE LINES 74-80

Monitor a stable stream
-----------------------

Now we monitor a stable online stream. Since the data distribution does
not change, the online lower confidence bound should remain well below
the monitoring threshold.

.. GENERATED FROM PYTHON SOURCE LINES 80-105

.. code-block:: Python


    X_online_no_shift, y_online_no_shift = generate_gaussian_stream(
        n_samples=800,
        shift_type="stable",
        random_state=random_state + 2,
    )

    for start in range(0, len(X_online_no_shift), batch_size):
        stop = start + batch_size
        y_pred_batch = clf.predict(X_online_no_shift[start:stop])
        monitor_no_shift.update(y_online_no_shift[start:stop], y_pred_batch)

    print(
        "No-shift scenario - harmful shift detected? "
        f"{monitor_no_shift.harmful_shift_detected}"
    )

    plot_monitoring_results(
        X_online_no_shift,
        y_online_no_shift,
        monitor_no_shift,
        threshold,
        title="Stable online stream",
    )


.. image-sg:: /examples_exchangeability_testing/1-quickstart/images/sphx_glr_plot_risk_monitoring_001.png
   :alt: Stable online stream, Online monitoring
   :srcset: /examples_exchangeability_testing/1-quickstart/images/sphx_glr_plot_risk_monitoring_001.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    No-shift scenario - harmful shift detected? False


.. GENERATED FROM PYTHON SOURCE LINES 106-115

Monitor an abrupt shift
-----------------------

Now let us see what happens when an abrupt shift occurs in the middle of
the stream. The data distribution changes suddenly, and the lower
confidence bound eventually crosses the threshold. Note that there is no
need to call `compute_threshold` again: we can reuse the threshold
computed earlier (this also shows that any custom threshold can be passed
directly at instantiation).

.. GENERATED FROM PYTHON SOURCE LINES 115-144

.. code-block:: Python


    X_online_abrupt, y_online_abrupt = generate_gaussian_stream(
        n_samples=800,
        shift_type="abrupt",
        prop_shift=prop_shift,
        random_state=random_state + 3,
    )
    shift_start_abrupt = int(len(y_online_abrupt) * (1 - prop_shift))

    monitor_abrupt = RiskMonitoring(risk="accuracy", threshold=threshold)
    for start in range(0, len(X_online_abrupt), batch_size):
        stop = start + batch_size
        y_pred_batch = clf.predict(X_online_abrupt[start:stop])
        monitor_abrupt.update(y_online_abrupt[start:stop], y_pred_batch)

    print(
        "Abrupt-shift scenario - harmful shift detected? "
        f"{monitor_abrupt.harmful_shift_detected}"
    )

    plot_monitoring_results(
        X_online_abrupt,
        y_online_abrupt,
        monitor_abrupt,
        threshold,
        title="Abrupt shift stream",
        shift_start=shift_start_abrupt,
    )


.. image-sg:: /examples_exchangeability_testing/1-quickstart/images/sphx_glr_plot_risk_monitoring_002.png
   :alt: Abrupt shift stream, Online monitoring
   :srcset: /examples_exchangeability_testing/1-quickstart/images/sphx_glr_plot_risk_monitoring_002.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Abrupt-shift scenario - harmful shift detected? True


.. GENERATED FROM PYTHON SOURCE LINES 145-151

Monitor a slow drift
--------------------

Finally, we create a slow drift. In that case the distribution evolves
progressively, so the harmful shift is typically detected later than with
an abrupt change.

.. GENERATED FROM PYTHON SOURCE LINES 151-180

.. code-block:: Python


    X_online_slow, y_online_slow = generate_gaussian_stream(
        n_samples=800,
        shift_type="slow",
        prop_shift=prop_shift,
        random_state=random_state + 4,
    )
    shift_start_slow = int(len(y_online_slow) * (1 - prop_shift))

    monitor_slow = RiskMonitoring(risk="accuracy", threshold=threshold)
    for start in range(0, len(X_online_slow), batch_size):
        stop = start + batch_size
        y_pred_batch = clf.predict(X_online_slow[start:stop])
        monitor_slow.update(y_online_slow[start:stop], y_pred_batch)

    print(
        "Slow-shift scenario - harmful shift detected? "
        f"{monitor_slow.harmful_shift_detected}"
    )

    plot_monitoring_results(
        X_online_slow,
        y_online_slow,
        monitor_slow,
        threshold,
        title="Slow shift stream",
        shift_start=shift_start_slow,
    )


.. image-sg:: /examples_exchangeability_testing/1-quickstart/images/sphx_glr_plot_risk_monitoring_003.png
   :alt: Slow shift stream, Online monitoring
   :srcset: /examples_exchangeability_testing/1-quickstart/images/sphx_glr_plot_risk_monitoring_003.png
   :class: sphx-glr-single-img


.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Slow-shift scenario - harmful shift detected? True


.. GENERATED FROM PYTHON SOURCE LINES 181-187

Interpret the scenarios
-----------------------

As expected, `RiskMonitoring` correctly does not fire any alarm on the
stable stream, detects the abrupt shift shortly after it occurs, and
eventually detects the slow drift once enough evidence has accumulated.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 3.857 seconds)


.. _sphx_glr_download_examples_exchangeability_testing_1-quickstart_plot_risk_monitoring.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_risk_monitoring.ipynb <plot_risk_monitoring.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_risk_monitoring.py <plot_risk_monitoring.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_risk_monitoring.zip <plot_risk_monitoring.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_