mapie.utils.train_conformalize_test_split

mapie.utils.train_conformalize_test_split(X: ndarray[tuple[Any, ...], dtype[_ScalarT]], y: ndarray[tuple[Any, ...], dtype[_ScalarT]], train_size: float | int, conformalize_size: float | int, test_size: float | int, random_state: int | None = None, shuffle: bool = True) Tuple[ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]], ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]

Split arrays or matrices into train, conformalization and test subsets.

Utility similar to sklearn.model_selection.train_test_split for splitting data into 3 sets.

We advise to give the major part of the data points to the train set and at least 200 data points to the conformalization set.

Parameters:
Xindexable with same type and length / shape[0] than “y”

Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

yindexable with same type and length / shape[0] than “X”

Allowed inputs are lists, numpy arrays, scipy-sparse matrices or pandas dataframes.

train_sizefloat or int

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the train split. If int, represents the absolute number of train samples.

conformalize_sizefloat or int

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the conformalize split. If int, represents the absolute number of conformalize samples.

test_sizefloat or int

If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.

random_stateint, RandomState instance or None, default=None

Controls the shuffling applied to the data before applying the split. Pass an int for reproducible output across multiple function calls.

shufflebool, default=True

Whether or not to shuffle the data before splitting.

Returns:
X_train, X_conformalize, X_test, y_train, y_conformalize, y_test

6 array-like splits of inputs. output types are the same as the input types.

Examples

>>> import numpy as np
>>> from sklearn.datasets import make_regression
>>> from mapie.utils import train_conformalize_test_split
>>> X, y = np.arange(10).reshape((5, 2)), range(5)
>>> X
array([[0, 1],
       [2, 3],
       [4, 5],
       [6, 7],
       [8, 9]])
>>> list(y)
[0, 1, 2, 3, 4]
>>> (
...     X_train, X_conformalize, X_test,
...     y_train, y_conformalize, y_test
... ) = train_conformalize_test_split(
...     X, y, train_size=0.6, conformalize_size=0.2, test_size=0.2, random_state=1
... )
>>> X_train
array([[8, 9],
       [0, 1],
       [6, 7]])
>>> X_conformalize
array([[2, 3]])
>>> X_test
array([[4, 5]])
>>> y_train
[4, 0, 3]
>>> y_conformalize
[1]
>>> y_test
[2]

Examples using mapie.utils.train_conformalize_test_split

Use MAPIE to plot prediction intervals

Use MAPIE to plot prediction intervals

Use MAPIE with a pre-trained model

Use MAPIE with a pre-trained model

Conformal Predictive Distribution with MAPIE

Conformal Predictive Distribution with MAPIE

The symmetric_correction parameter of ConformalizedQuantileRegressor

The symmetric_correction parameter of ConformalizedQuantileRegressor

Focus on residual normalised score

Focus on residual normalised score

Use MAPIE to plot prediction sets

Use MAPIE to plot prediction sets

Set prediction example in the binary classification setting

Set prediction example in the binary classification setting

Use MAPIE to control the precision of a binary classifier

Use MAPIE to control the precision of a binary classifier

Use MAPIE to control risk of a binary classifier with multiple prediction parameters

Use MAPIE to control risk of a binary classifier with multiple prediction parameters

Use MAPIE to control multiple risks of a binary classifier

Use MAPIE to control multiple risks of a binary classifier

Tutorial: how to ensure fairness across groups with Mondrian

Tutorial: how to ensure fairness across groups with Mondrian