xai¶
Tools to explain aspects of a model.
compute_partial_dependence(pred_fun, X, features, grid=10, weights=None, n_max=1000, rng=None)
¶
Compute partial dependence.
This is a fast brute force method to compute partial dependence values for the given grid.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred_fun
|
callable
|
Prediction function, such that |
required |
X
|
array-like of shape (n_obs, n_features)
|
The dataframe or array of features to be passed to the model predict function. |
required |
features
|
int or str
|
Column index or column name of the feature in |
required |
grid
|
Series or int
|
Values of the feature specified by |
10
|
weights
|
array-like of shape (n_obs) or None
|
Case weights. If given, the bias is calculated as weighted average of the identification function with these weights. |
None
|
n_max
|
int or None
|
The number of rows to subsample from X. This speeds up computation, in particular for slow predict functions. |
1000
|
rng
|
(Generator, int or None)
|
The random number generator. The used one will be |
None
|
Returns:
| Type | Description |
|---|---|
np.ndarray : shape (n_grid,)
|
Partial dependence values for the grid. |
compute_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None)
¶
Compute permutation feature importance.
This function calculates permutation feature importance for features and/or
feature groups according to the idea in [Breiman] and [Fisher].
For each feature (group), permutation importance measures how much the model
performance worsenes when shuffling the values of that feature (group) before
calculating predictions. The idea is that if a feature is important,
then shuffling its values will lead to a large drop in model performance.
Shuffling is done n_repeats times, and mean differences and mean ratios are
returned along with their standard errors.
Note that the model is never retrained during this process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred_fun
|
callable
|
A callable to get predictions, i.e. |
required |
X
|
array-like of shape (n_obs, n_features)
|
The dataframe or array of features to be passed to the model predict function. |
required |
y
|
ArrayLike
|
1D array of shape (n_observations,) containing the target values. |
required |
features
|
Optional[Union[list, tuple, set, dict]]
|
Iterable of feature names/indices of features in |
None
|
scoring_function
|
callable
|
A scoring function with signature roughly
|
SquaredError()
|
weights
|
array-like of shape (n_obs) or None
|
Case weights passed to the scoring_function. |
None
|
n_repeats
|
int
|
Number of times to repeat the permutation for each feature group. |
5
|
n_max
|
int or None
|
Maximum number of observations used. If the number of observations is greater
than |
10_000
|
scoring_orientation
|
str
|
Direction of scoring function. Use "smaller_is_better" if smaller values are better (e.g., average losses), or "greater_is_better" if greater values are better (e.g., R-squared). |
"smaller_is_better"
|
rng
|
(Generator, int or None)
|
The random number generator used for shuffling values and for subsampling
|
None
|
Returns:
| Name | Type | Description |
|---|---|---|
df |
DataFrame
|
A DataFrame with one row per feature (group) and the following columns:
|
References
[Breiman]-
Breiman, L. (2001). "Random Forests". Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
[Fisher]-
Fisher, A. and Rudin, C. and Dominici F. (2019). "All Models Are Wrong, but Many Are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously". Journal of Machine Learning Research, 20(177), 1-81.
Examples:
>>> import numpy as np
>>> import polars as pl
>>> from sklearn.linear_model import LinearRegression
>>> # Create a synthetic dataset
>>> rng = np.random.default_rng(1)
>>> n = 1000
>>> X = pl.DataFrame(
... {
... "rooms": rng.choice([2.5, 3.5, 4.5], n),
... "area": rng.uniform(30, 120, n),
... "age": rng.uniform(0, 100, n),
... }
... )
>>> y = X["area"] + 20 * X["rooms"] + rng.normal(0, 10, n)
>>> model = LinearRegression()
>>> _ = model.fit(X, y)
>>> perm_importance = compute_permutation_importance(
... pred_fun=model.predict,
... X=X,
... y=y,
... rng=1,
... )
>>> perm_importance
shape: (3, 5)
┌─────────┬─────────────────┬───────────────────┬────────────┬──────────────┐
│ feature ┆ difference_mean ┆ difference_stderr ┆ ratio_mean ┆ ratio_stderr │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 ┆ f64 │
╞═════════╪═════════════════╪═══════════════════╪════════════╪══════════════╡
│ rooms ┆ 524.213195 ┆ 8.813555 ┆ 6.263515 ┆ 0.088495 │
│ area ┆ 1328.885114 ┆ 15.924463 ┆ 14.343058 ┆ 0.159894 │
│ age ┆ 0.174047 ┆ 0.090023 ┆ 1.001748 ┆ 0.000904 │
└─────────┴─────────────────┴───────────────────┴────────────┴──────────────┘
Using feature subsets
>>> perm_importance = compute_permutation_importance(
... pred_fun=model.predict,
... X=X,
... y=y,
... features=["area", "age"],
... rng=1,
... )
Using feature groups
>>> perm_importance = compute_permutation_importance(
... pred_fun=model.predict,
... X=X,
... y=y,
... features={"size": ["area", "rooms"], "age": "age"},
... rng=1,
... )
plot_permutation_importance(pred_fun, X, y, features=None, scoring_function=SquaredError(), weights=None, n_repeats=5, n_max=10000, scoring_orientation='smaller_is_better', rng=None, max_display=15, which='difference', confidence_level=0.95, ax=None)
¶
Plot permutation importance as barplot with confidence intervals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pred_fun
|
callable
|
A callable to get predictions, i.e. |
required |
X
|
array-like of shape (n_obs, n_features)
|
The dataframe or array of features to be passed to the model predict function. |
required |
y
|
ArrayLike
|
1D array of shape (n_observations,) containing the target values. |
required |
features
|
Optional[Union[list, tuple, set, dict]]
|
Iterable of feature names/indices of features in |
None
|
scoring_function
|
callable
|
A scoring function with signature roughly
|
SquaredError()
|
weights
|
array-like of shape (n_obs) or None
|
Case weights passed to the scoring_function. |
None
|
n_repeats
|
int
|
Number of times to repeat the permutation for each feature group. |
5
|
n_max
|
int or None
|
Maximum number of observations used. If the number of observations is greater
than |
10_000
|
scoring_orientation
|
str
|
Direction of scoring function. Use "smaller_is_better" if smaller values are better (e.g., average losses), or "greater_is_better" if greater values are better (e.g., R-squared). |
"smaller_is_better"
|
rng
|
(Generator, int or None)
|
The random number generator used for shuffling values and for subsampling
|
None
|
max_display
|
int or None
|
Maximum number of features to display, by default 15. If None, all features are displayed. |
15
|
which
|
str
|
Should difference or ratio scores be shown? Either "difference" or "ratio". |
"difference"
|
confidence_level
|
float
|
Confidence level for error bars. If 0, no error bars are plotted. Value must
fulfil |
0.95
|
ax
|
matplotlib.axes.Axes or plotly Figure
|
Axes object to draw the plot onto, otherwise uses the current Axes. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
ax |
Either the matplotlib axes or the plotly figure. |