`xai`¶

Tools to explain aspects of a model.

`compute_partial_dependence(pred_fun, X, features, grid=10, weights=None, n_max=1000, rng=None)` ¶

Compute partial dependence.

This is a fast brute force method to compute partial dependence values for the given grid.

Parameters:

Name	Type	Description	Default
`pred_fun`	`callable`	Prediction function, such that `pred_fun(X)` gives predicted values.	required
`X`	`array-like of shape (n_obs, n_features)`	The dataframe or array of features to be passed to the model predict function.	required
`features`	`int or str`	Column index or column name of the feature in `X`.	required
`grid`	`Series or int`	Values of the feature specified by `features`, for wich to compute partial dependence. If an integer is specified, a grid of `grid` points of the given feature is constructed automatically using binning.	`10`
`weights`	`array-like of shape (n_obs) or None`	Case weights. If given, the bias is calculated as weighted average of the identification function with these weights.	`None`
`n_max`	`int or None`	The number of rows to subsample from X. This speeds up computation, in particular for slow predict functions.	`1000`
`rng`	`(Generator, int or None)`	The random number generator. The used one will be `np.random.default_rng(rng)`.	`None`

Returns:

Type	Description
`np.ndarray : shape (n_grid,)`	Partial dependence values for the grid.

`compute_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None)` ¶

Compute permutation feature importance.

This function calculates permutation feature importance for features and/or feature groups according to the idea in [Breiman] and [Fisher].

For each feature (group), permutation importance measures how much the model performance worsenes when shuffling the values of that feature (group) before calculating predictions. The idea is that if a feature is important, then shuffling its values will lead to a large drop in model performance. Shuffling is done n_repeats times, and mean differences and mean ratios are returned along with their standard errors.

Note that the model is never retrained during this process.

Parameters:

Name	Type	Description	Default
`pred_fun`	`callable`	A callable to get predictions, i.e. `pred_fun(X)`.	required
`X`	`array-like of shape (n_obs, n_features)`	The dataframe or array of features to be passed to the model predict function.	required
`y`	`ArrayLike`	1D array of shape (n_observations,) containing the target values.	required
`features`	`Optional[Union[list, tuple, set, dict]]`	Iterable of feature names/indices of features in `X`. The default None will use all features in `X`. Can also be a dictionary with lists of feature names/indices as values. The keys of the dictionary are used as feature group names. Example: `{"x1": ["x1"], "x2": ["x2"], "size": ["x1", "x2"]}`. Passing a dictionary is also useful if you want to represent feature indices of a numpy array as strings. Example: `{"area": 0, "age": 1}`.	`None`
`scoring_function`	`callable`	A scoring function with signature roughly `fun(y_obs, y_pred, weights) -> float`.	`SquaredError()`
`weights`	`array-like of shape (n_obs) or None`	Case weights passed to the scoring_function.	`None`
`n_repeats`	`int`	Number of times to repeat the permutation for each feature group.	`5`
`n_max`	`int or None`	Maximum number of observations used. If the number of observations is greater than `n_max`, a random subset of size `n_max` will be drawn from `X`, `y`, (and `weights`). Pass None for no subsampling.	`10_000`
`scoring_orientation`	`str`	Direction of scoring function. Use "smaller_is_better" if smaller values are better (e.g., average losses), or "greater_is_better" if greater values are better (e.g., R-squared).	`"smaller_is_better"`
`rng`	`(Generator, int or None)`	The random number generator used for shuffling values and for subsampling `n_max` rows. The input is internally wrapped by `np.random.default_rng(rng)`.	`None`

Returns:

Name	Type	Description
`df`	`DataFrame`	A DataFrame with one row per feature (group) and the following columns: `feature`: Feature name or feature group name. `difference_mean`: Mean of the score differences. `difference_stderr`: Standard error, i.e. standard deviation of `difference_mean`. (None if `n_repeats = 1`.) `ratio_mean`: Mean of the score ratios. `ratio_stderr`: Standard error, i.e. standard deviation of `ratio_mean`. (None if `n_repeats = 1`.)

References

[Breiman]: Breiman, L. (2001). "Random Forests". Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
[Fisher]: Fisher, A. and Rudin, C. and Dominici F. (2019). "All Models Are Wrong, but Many Are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously". Journal of Machine Learning Research, 20(177), 1-81.

Examples:

>>> import numpy as np
>>> import polars as pl
>>> from sklearn.linear_model import LinearRegression
>>> # Create a synthetic dataset
>>> rng = np.random.default_rng(1)
>>> n = 1000
>>> X = pl.DataFrame(
...     {
...         "rooms": rng.choice([2.5, 3.5, 4.5], n),
...         "area": rng.uniform(30, 120, n),
...         "age": rng.uniform(0, 100, n),
...     }
... )
>>> y = X["area"] + 20 * X["rooms"] + rng.normal(0, 10, n)
>>> model = LinearRegression()
>>> _ = model.fit(X, y)
>>> perm_importance = compute_permutation_importance(
...     pred_fun=model.predict,
...     X=X,
...     y=y,
...     rng=1,
... )
>>> perm_importance
shape: (3, 5)
┌─────────┬─────────────────┬───────────────────┬────────────┬──────────────┐
│ feature ┆ difference_mean ┆ difference_stderr ┆ ratio_mean ┆ ratio_stderr │
│ ---     ┆ ---             ┆ ---               ┆ ---        ┆ ---          │
│ str     ┆ f64             ┆ f64               ┆ f64        ┆ f64          │
╞═════════╪═════════════════╪═══════════════════╪════════════╪══════════════╡
│ rooms   ┆ 524.213195      ┆ 8.813555          ┆ 6.263515   ┆ 0.088495     │
│ area    ┆ 1328.885114     ┆ 15.924463         ┆ 14.343058  ┆ 0.159894     │
│ age     ┆ 0.174047        ┆ 0.090023          ┆ 1.001748   ┆ 0.000904     │
└─────────┴─────────────────┴───────────────────┴────────────┴──────────────┘

Using feature subsets

>>> perm_importance = compute_permutation_importance(
...     pred_fun=model.predict,
...     X=X,
...     y=y,
...     features=["area", "age"],
...     rng=1,
... )

Using feature groups

>>> perm_importance = compute_permutation_importance(
...     pred_fun=model.predict,
...     X=X,
...     y=y,
...     features={"size": ["area", "rooms"], "age": "age"},
...     rng=1,
... )

`plot_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None, max_display=15, which='difference', confidence_level=0.95, ax=None)` ¶

Plot permutation importance as barplot with confidence intervals.

Parameters:

Name	Type	Description	Default
`pred_fun`	`callable`	A callable to get predictions, i.e. `pred_fun(X)`.	required
`X`	`array-like of shape (n_obs, n_features)`	The dataframe or array of features to be passed to the model predict function.	required
`y`	`ArrayLike`	1D array of shape (n_observations,) containing the target values.	required
`features`	`Optional[Union[list, tuple, set, dict]]`	Iterable of feature names/indices of features in `X`. The default None will use all features in `X`. Can also be a dictionary with lists of feature names/indices as values. The keys of the dictionary are used as feature group names. Example: `{"x1": ["x1"], "x2": ["x2"], "size": ["x1", "x2"]}`. Passing a dictionary is also useful if you want to represent feature indices of a numpy array as strings. Example: `{"area": 0, "age": 1}`.	`None`
`scoring_function`	`callable`	A scoring function with signature roughly `fun(y_obs, y_pred, weights) -> float`.	`SquaredError()`
`weights`	`array-like of shape (n_obs) or None`	Case weights passed to the scoring_function.	`None`
`n_repeats`	`int`	Number of times to repeat the permutation for each feature group.	`5`
`n_max`	`int or None`	Maximum number of observations used. If the number of observations is greater than `n_max`, a random subset of size `n_max` will be drawn from `X`, `y`, (and `weights`). Pass None for no subsampling.	`10_000`
`scoring_orientation`	`str`	Direction of scoring function. Use "smaller_is_better" if smaller values are better (e.g., average losses), or "greater_is_better" if greater values are better (e.g., R-squared).	`"smaller_is_better"`
`rng`	`(Generator, int or None)`	The random number generator used for shuffling values and for subsampling `n_max` rows. The input is internally wrapped by `np.random.default_rng(rng)`.	`None`
`max_display`	`int or None`	Maximum number of features to display, by default 15. If None, all features are displayed.	`15`
`which`	`str`	Should difference or ratio scores be shown? Either "difference" or "ratio".	`"difference"`
`confidence_level`	`float`	Confidence level for error bars. If 0, no error bars are plotted. Value must fulfil `0 <= confidence_level < 1`. Set to 0.683 to show standard errors.	`0.95`
`ax`	`matplotlib.axes.Axes or plotly Figure`	Axes object to draw the plot onto, otherwise uses the current Axes.	`None`

Returns:

Name	Type	Description
`ax`		Either the matplotlib axes or the plotly figure.

xai¶

compute_partial_dependence(pred_fun, X, features, grid=10, weights=None, n_max=1000, rng=None) ¶

compute_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None) ¶

plot_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None, max_display=15, which='difference', confidence_level=0.95, ax=None) ¶

`xai`¶

`compute_partial_dependence(pred_fun, X, features, grid=10, weights=None, n_max=1000, rng=None)` ¶

`compute_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None)` ¶

`plot_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None, max_display=15, which='difference', confidence_level=0.95, ax=None)` ¶