scoring

`scoring` ¶

`ElementaryScore` ¶

Elementary scoring function.

The smaller the better.

The elementary scoring function is consistent for the specified functional for all values of eta and is the main ingredient for Murphy diagrams. See Notes for further details.

Parameters:

Name	Type	Description	Default
`eta`	`float`	Free parameter.	required
`functional`	`str`	The functional that is induced by the identification function `V`. Options are: `"mean"`. Argument `level` is neglected. `"median"`. Argument `level` is neglected. `"expectile"` `"quantile"`	`'mean'`
`level`	`float`	The level of the expectile of quantile. (Often called \(\alpha\).) It must be `0 < level < 1`. `level=0.5` and `functional="expectile"` gives the mean. `level=0.5` and `functional="quantile"` gives the median.	`0.5`

Notes

The elementary scoring or loss function is given by

\[ S_\eta(y, z) = (\mathbf{1}\{\eta \le z\} - \mathbf{1}\{\eta \le y\}) V(y, \eta) \]

with identification functions \(V\) for the given functional \(T\) . If allows for the mixture or Choquet representation

\[ S(y, z) = \int S_\eta(y, z) \,dH(\eta) \]

for some locally finite measure \(H\). It follows that the scoring function \(S\) is consistent for \(T\).

References

[Jordan2022]: A.I. Jordan, A. Mühlemann, J.F. Ziegel. "Characterizing the optimal solutions to the isotonic regression problem for identifiable functionals". (2022) doi:10.1007/s10463-021-00808-0
[GneitingResin2022]: T. Gneiting, J. Resin. "Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination". arxiv:2108.03210

Examples:

>>> el_score = ElementaryScore(eta=2, functional="mean")
>>> el_score(y_obs=[1, 2, 2, 1], y_pred=[4, 1, 2, 3])
0.5

Source code in src/model_diagnostics/scoring/scoring.py

class ElementaryScore(_BaseScoringFunction):
    r"""Elementary scoring function.

    The smaller the better.

    The elementary scoring function is consistent for the specified `functional` for
    all values of `eta` and is the main ingredient for Murphy diagrams.
    See [Notes](#notes) for further details.

    Parameters
    ----------
    eta : float
        Free parameter.
    functional : str
        The functional that is induced by the identification function `V`. Options are:

        - `"mean"`. Argument `level` is neglected.
        - `"median"`. Argument `level` is neglected.
        - `"expectile"`
        - `"quantile"`
    level : float
        The level of the expectile of quantile. (Often called \(\alpha\).)
        It must be `0 < level < 1`.
        `level=0.5` and `functional="expectile"` gives the mean.
        `level=0.5` and `functional="quantile"` gives the median.

    Notes
    -----
    The elementary scoring or loss function is given by

    \[
    S_\eta(y, z) = (\mathbf{1}\{\eta \le z\} - \mathbf{1}\{\eta \le y\})
    V(y, \eta)
    \]

    with [identification functions]
    [model_diagnostics.calibration.identification_function]
    \(V\) for the given `functional` \(T\) . If allows for the mixture or Choquet
    representation

    \[
    S(y, z) = \int S_\eta(y, z) \,dH(\eta)
    \]

    for some locally finite measure \(H\). It follows that the scoring function \(S\)
    is consistent for \(T\).

    References
    ----------
    `[Jordan2022]`

    :   A.I. Jordan, A. Mühlemann, J.F. Ziegel.
        "Characterizing the optimal solutions to the isotonic regression problem for
        identifiable functionals". (2022)
        [doi:10.1007/s10463-021-00808-0](https://doi.org/10.1007/s10463-021-00808-0)

    `[GneitingResin2022]`

    :   T. Gneiting, J. Resin.
        "Regression Diagnostics meets Forecast Evaluation: Conditional Calibration,
        Reliability Diagrams, and Coefficient of Determination".
        [arxiv:2108.03210](https://arxiv.org/abs/2108.03210)

    Examples
    --------
    >>> el_score = ElementaryScore(eta=2, functional="mean")
    >>> el_score(y_obs=[1, 2, 2, 1], y_pred=[4, 1, 2, 3])
    0.5
    """

    def __init__(
        self, eta: float, functional: str = "mean", level: float = 0.5
    ) -> None:
        self.eta = eta
        self._functional = functional
        if level <= 0 or level >= 1:
            msg = f"Argument level must fulfil 0 < level < 1, got {level}."
            raise ValueError(msg)
        self.level = level

    @property
    def functional(self):
        return self._functional

    def score_per_obs(
        self,
        y_obs: npt.ArrayLike,
        y_pred: npt.ArrayLike,
    ) -> np.ndarray:
        """Score per observation.

        Parameters
        ----------
        y_obs : array-like of shape (n_obs)
            Observed values of the response variable.
        y_pred : array-like of shape (n_obs)
            Predicted values of the `functional` of interest, e.g. the conditional
            expectation of the response, `E(Y|X)`.

        Returns
        -------
        score_per_obs : ndarray
            Values of the scoring function for each observation.
        """
        y: np.ndarray
        z: np.ndarray
        y, z = validate_2_arrays(y_obs, y_pred)

        eta = self.eta
        eta_term = np.less_equal(eta, z).astype(float) - np.less_equal(eta, y)
        return eta_term * identification_function(
            y_obs=y_obs,
            y_pred=np.full(y.shape, fill_value=float(eta)),
            functional=self.functional,
            level=self.level,
        )

`score_per_obs(y_obs, y_pred)` ¶

Score per observation.

Parameters:

Name	Type	Description	Default
`y_obs`	`array-like of shape (n_obs)`	Observed values of the response variable.	required
`y_pred`	`array-like of shape (n_obs)`	Predicted values of the `functional` of interest, e.g. the conditional expectation of the response, `E(Y\|X)`.	required

Returns:

Name	Type	Description
`score_per_obs`	`ndarray`	Values of the scoring function for each observation.

Source code in src/model_diagnostics/scoring/scoring.py

def score_per_obs(
    self,
    y_obs: npt.ArrayLike,
    y_pred: npt.ArrayLike,
) -> np.ndarray:
    """Score per observation.

    Parameters
    ----------
    y_obs : array-like of shape (n_obs)
        Observed values of the response variable.
    y_pred : array-like of shape (n_obs)
        Predicted values of the `functional` of interest, e.g. the conditional
        expectation of the response, `E(Y|X)`.

    Returns
    -------
    score_per_obs : ndarray
        Values of the scoring function for each observation.
    """
    y: np.ndarray
    z: np.ndarray
    y, z = validate_2_arrays(y_obs, y_pred)

    eta = self.eta
    eta_term = np.less_equal(eta, z).astype(float) - np.less_equal(eta, y)
    return eta_term * identification_function(
        y_obs=y_obs,
        y_pred=np.full(y.shape, fill_value=float(eta)),
        functional=self.functional,
        level=self.level,
    )

`GammaDeviance` ¶

Gamma deviance.

The smaller the better, minimum is zero.

The Gamma deviance is strictly consistent for the mean. It has a degree of homogeneity of 0 and is therefore insensitive to a change of units or multiplication of y_obs and y_pred by the same positive constant.

Attributes:

Name	Type	Description
`functional`	`str`	"mean"

Notes

\(S(y, z) = 2(\frac{y}{z} -\log\frac{y}{z} - 1)\)

Examples:

>>> gd = GammaDeviance()
>>> gd(y_obs=[3, 2, 1, 1], y_pred=[2, 1, 1 , 2])
0.2972674459459178

Source code in src/model_diagnostics/scoring/scoring.py

class GammaDeviance(HomogeneousExpectileScore):
    r"""Gamma deviance.

    The smaller the better, minimum is zero.

    The Gamma deviance is strictly consistent for the mean.
    It has a degree of homogeneity of 0 and is therefore insensitive to a change of
    units or multiplication of `y_obs` and `y_pred` by the same positive constant.

    Attributes
    ----------
    functional: str
        "mean"

    Notes
    -----
    \(S(y, z) = 2(\frac{y}{z} -\log\frac{y}{z} - 1)\)

    Examples
    --------
    >>> gd = GammaDeviance()
    >>> gd(y_obs=[3, 2, 1, 1], y_pred=[2, 1, 1 , 2])
    0.2972674459459178
    """

    def __init__(self) -> None:
        super().__init__(degree=0, level=0.5)

`HomogeneousExpectileScore` ¶

Homogeneous scoring function of degree h for expectiles.

The smaller the better, minimum is zero.

Up to a multiplicative constant, these are the only scoring functions that are strictly consistent for expectiles at level alpha and homogeneous functions. The possible additive constant is chosen such that the minimal function value equals zero.

Note that the ½-expectile (level alpha=0.5) equals the mean.

Parameters:

Name	Type	Description	Default
`degree`	`float`	Degree of homogeneity.	`2`
`level`	`float`	The level of the expectile. (Often called \(\alpha\).) It must be `0 < level < 1`. `level=0.5` gives the mean.	`0.5`

Attributes:

Name	Type	Description
`functional`	`str`	"mean" if `level=0.5`, else "expectile"

Notes

The homogeneous score of degree \(h\) is given by

\[ S_\alpha^h(y, z) = 2 |\mathbf{1}\{z \ge y\} - \alpha| \frac{2}{h(h-1)} \left(|y|^h - |z|^h - h \operatorname{sign}(z) |z|^{h-1} (y-z)\right) \]

Note that the first term, \(2 |\mathbf{1}\{z \ge y\} - \alpha|\) equals 1 for \(\alpha=0.5\). There are important domain restrictions and limits:

\(h>1\): All real numbers \(y\) and \(z\) are allowed.

Special case \(h=2, \alpha=\frac{1}{2}\) equals the squared error, aka Normal deviance \(S(y, z) = (y - z)^2\).
\(0 < h \leq 1\): Only \(y \geq 0\), \(z>0\) are allowed.

Special case \(h=1, \alpha=\frac{1}{2}\) (by taking the limit) equals the Poisson deviance \(S(y, z) = 2(y\log\frac{y}{z} - y + z)\).
\(h \leq 0\): Only \(y>0\), \(z>0\) are allowed.

Special case \(h=0, \alpha=\frac{1}{2}\) (by taking the limit) equals the Gamma deviance \(S(y, z) = 2(\frac{y}{z} -\log\frac{y}{z} - 1)\).

For the common domains, \(S_{\frac{1}{2}}^h\) equals the Tweedie deviance with the following relation between the degree of homogeneity \(h\) and the Tweedie power \(p\): \(h = 2-p\).

References

[Gneiting2011]: T. Gneiting. "Making and Evaluating Point Forecasts". (2011) doi:10.1198/jasa.2011.r10138 arxiv:0912.0902

Examples:

>>> hes = HomogeneousExpectileScore(degree=2, level=0.1)
>>> hes(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
0.95

Source code in src/model_diagnostics/scoring/scoring.py

class HomogeneousExpectileScore(_BaseScoringFunction):
    r"""Homogeneous scoring function of degree h for expectiles.

    The smaller the better, minimum is zero.

    Up to a multiplicative constant, these are the only scoring functions that are
    strictly consistent for expectiles at level alpha and homogeneous functions.
    The possible additive constant is chosen such that the minimal function value
    equals zero.

    Note that the 1/2-expectile (level alpha=0.5) equals the mean.

    Parameters
    ----------
    degree : float
        Degree of homogeneity.
    level : float
        The level of the expectile. (Often called \(\alpha\).)
        It must be `0 < level < 1`.
        `level=0.5` gives the mean.

    Attributes
    ----------
    functional: str
        "mean" if `level=0.5`, else "expectile"

    Notes
    -----
    The homogeneous score of degree \(h\) is given by

    \[
    S_\alpha^h(y, z) = 2 |\mathbf{1}\{z \ge y\} - \alpha| \frac{2}{h(h-1)}
    \left(|y|^h - |z|^h - h \operatorname{sign}(z) |z|^{h-1} (y-z)\right)
    \]

    Note that the first term, \(2 |\mathbf{1}\{z \ge y\} - \alpha|\) equals 1 for
    \(\alpha=0.5\).
    There are important domain restrictions and limits:

    - \(h>1\): All real numbers \(y\) and \(z\) are allowed.

        Special case \(h=2, \alpha=\frac{1}{2}\) equals the squared error, aka Normal
        deviance \(S(y, z) = (y - z)^2\).

    - \(0 < h \leq 1\): Only \(y \geq 0\), \(z>0\) are allowed.

        Special case \(h=1, \alpha=\frac{1}{2}\) (by taking the limit) equals the
        Poisson deviance \(S(y, z) = 2(y\log\frac{y}{z} - y + z)\).

    - \(h \leq 0\): Only \(y>0\), \(z>0\) are allowed.

        Special case \(h=0, \alpha=\frac{1}{2}\) (by taking the limit) equals the Gamma
        deviance \(S(y, z) = 2(\frac{y}{z} -\log\frac{y}{z} - 1)\).

    For the common domains, \(S_{\frac{1}{2}}^h\) equals the
    [Tweedie deviance](https://en.wikipedia.org/wiki/Tweedie_distribution) with the
    following relation between the degree of homogeneity \(h\) and the Tweedie
    power \(p\): \(h = 2-p\).

    References
    ----------
    `[Gneiting2011]`

    :   T. Gneiting.
        "Making and Evaluating Point Forecasts". (2011)
        [doi:10.1198/jasa.2011.r10138](https://doi.org/10.1198/jasa.2011.r10138)
        [arxiv:0912.0902](https://arxiv.org/abs/0912.0902)

    Examples
    --------
    >>> hes = HomogeneousExpectileScore(degree=2, level=0.1)
    >>> hes(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
    0.95
    """

    def __init__(self, degree: float = 2, level: float = 0.5) -> None:
        self.degree = degree
        if level <= 0 or level >= 1:
            msg = f"Argument level must fulfil 0 < level < 1, got {level}."
            raise ValueError(msg)
        self.level = level

    @property
    def functional(self):
        if self.level == 0.5:
            return "mean"
        else:
            return "expectile"

    def score_per_obs(
        self,
        y_obs: npt.ArrayLike,
        y_pred: npt.ArrayLike,
    ) -> np.ndarray:
        """Score per observation.

        Parameters
        ----------
        y_obs : array-like of shape (n_obs)
            Observed values of the response variable.
        y_pred : array-like of shape (n_obs)
            Predicted values of the `functional` of interest, e.g. the conditional
            expectation of the response, `E(Y|X)`.

        Returns
        -------
        score_per_obs : ndarray
            Values of the scoring function for each observation.
        """
        y: np.ndarray
        z: np.ndarray
        y, z = validate_2_arrays(y_obs, y_pred)

        if self.degree == 2:
            # Fast path
            score = np.square(z - y)
        elif self.degree > 1:
            z_abs = np.abs(z)
            score = 2 * (
                (np.power(np.abs(y), self.degree) - np.power(z_abs, self.degree))
                / (self.degree * (self.degree - 1))
                - np.sign(z)
                / (self.degree - 1)
                * np.power(z_abs, self.degree - 1)
                * (y - z)
            )
        elif self.degree == 1:
            # Domain: y >= 0 and z > 0
            if not np.all((y >= 0) & (z > 0)):
                msg = (
                    f"Valid domain for degree={self.degree} is y_obs >= 0 and "
                    "y_pred > 0."
                )
                raise ValueError(msg)
            score = 2 * (special.xlogy(y, y / z) - y + z)
        elif self.degree == 0:
            # Domain: y > 0 and z > 0.
            if not np.all((y > 0) & (z > 0)):
                msg = (
                    f"Valid domain for degree={self.degree} is "
                    "y_obs > 0 and y_pred > 0."
                )
                raise ValueError(msg)
            y_z = y / z
            score = 2 * (y_z - np.log(y_z) - 1)
        else:  # self.degree < 1
            # Domain: y >= 0 and z > 0 for 0 < self.degree < 1
            # Domain: y > 0  and z > 0 else
            if self.degree > 0:
                if not np.all((y >= 0) & (z > 0)):
                    msg = (
                        f"Valid domain for degree={self.degree} is "
                        "y_obs >= 0 and y_pred > 0."
                    )
                    raise ValueError(msg)
            elif not np.all((y > 0) & (z > 0)):
                msg = (
                    f"Valid domain for degree={self.degree} is "
                    "y_obs > 0 and y_pred > 0."
                )
                raise ValueError(msg)
            # Note: We add 0.0 to be sure we have floating points. Integers are not
            # allowerd to be raised to a negative power.
            score = 2 * (
                (np.power(y + 0.0, self.degree) - np.power(z + 0.0, self.degree))
                / (self.degree * (self.degree - 1))
                - 1 / (self.degree - 1) * np.power(z + 0.0, self.degree - 1) * (y - z)
            )

        if self.level == 0.5:
            return score
        else:
            return 2 * np.abs(np.greater_equal(z, y) - self.level) * score

`score_per_obs(y_obs, y_pred)` ¶

Score per observation.

Parameters:

Name	Type	Description	Default
`y_obs`	`array-like of shape (n_obs)`	Observed values of the response variable.	required
`y_pred`	`array-like of shape (n_obs)`	Predicted values of the `functional` of interest, e.g. the conditional expectation of the response, `E(Y\|X)`.	required

Returns:

Name	Type	Description
`score_per_obs`	`ndarray`	Values of the scoring function for each observation.

Source code in src/model_diagnostics/scoring/scoring.py

def score_per_obs(
    self,
    y_obs: npt.ArrayLike,
    y_pred: npt.ArrayLike,
) -> np.ndarray:
    """Score per observation.

    Parameters
    ----------
    y_obs : array-like of shape (n_obs)
        Observed values of the response variable.
    y_pred : array-like of shape (n_obs)
        Predicted values of the `functional` of interest, e.g. the conditional
        expectation of the response, `E(Y|X)`.

    Returns
    -------
    score_per_obs : ndarray
        Values of the scoring function for each observation.
    """
    y: np.ndarray
    z: np.ndarray
    y, z = validate_2_arrays(y_obs, y_pred)

    if self.degree == 2:
        # Fast path
        score = np.square(z - y)
    elif self.degree > 1:
        z_abs = np.abs(z)
        score = 2 * (
            (np.power(np.abs(y), self.degree) - np.power(z_abs, self.degree))
            / (self.degree * (self.degree - 1))
            - np.sign(z)
            / (self.degree - 1)
            * np.power(z_abs, self.degree - 1)
            * (y - z)
        )
    elif self.degree == 1:
        # Domain: y >= 0 and z > 0
        if not np.all((y >= 0) & (z > 0)):
            msg = (
                f"Valid domain for degree={self.degree} is y_obs >= 0 and "
                "y_pred > 0."
            )
            raise ValueError(msg)
        score = 2 * (special.xlogy(y, y / z) - y + z)
    elif self.degree == 0:
        # Domain: y > 0 and z > 0.
        if not np.all((y > 0) & (z > 0)):
            msg = (
                f"Valid domain for degree={self.degree} is "
                "y_obs > 0 and y_pred > 0."
            )
            raise ValueError(msg)
        y_z = y / z
        score = 2 * (y_z - np.log(y_z) - 1)
    else:  # self.degree < 1
        # Domain: y >= 0 and z > 0 for 0 < self.degree < 1
        # Domain: y > 0  and z > 0 else
        if self.degree > 0:
            if not np.all((y >= 0) & (z > 0)):
                msg = (
                    f"Valid domain for degree={self.degree} is "
                    "y_obs >= 0 and y_pred > 0."
                )
                raise ValueError(msg)
        elif not np.all((y > 0) & (z > 0)):
            msg = (
                f"Valid domain for degree={self.degree} is "
                "y_obs > 0 and y_pred > 0."
            )
            raise ValueError(msg)
        # Note: We add 0.0 to be sure we have floating points. Integers are not
        # allowerd to be raised to a negative power.
        score = 2 * (
            (np.power(y + 0.0, self.degree) - np.power(z + 0.0, self.degree))
            / (self.degree * (self.degree - 1))
            - 1 / (self.degree - 1) * np.power(z + 0.0, self.degree - 1) * (y - z)
        )

    if self.level == 0.5:
        return score
    else:
        return 2 * np.abs(np.greater_equal(z, y) - self.level) * score

`HomogeneousQuantileScore` ¶

Homogeneous scoring function of degree h for quantiles.

The smaller the better, minimum is zero.

Up to a multiplicative constant, these are the only scoring funtions that are strictly consistent for quantiles at level alpha and homogeneous functions. The possible additive constant is chosen such that the minimal function value equals zero.

Note that the ½-quantile (level alpha=0.5) equals the median.

Parameters:

Name	Type	Description	Default
`degree`	`float`	Degree of homogeneity.	`2`
`level`	`float`	The level of the quantile. (Often called \(\alpha\).) It must be `0 < level < 1`. `level=0.5` gives the median.	`0.5`

Attributes:

Name	Type	Description
`functional`	`str`	"quantile"

Notes

The homogeneous score of degree \(h\) is given by

\[ S_\alpha^h(y, z) = (\mathbf{1}\{z \ge y\} - \alpha) \frac{z^h - y^h}{h} \]

There are important domain restrictions and limits:

\(h\) positive odd integer: All real numbers \(y\) and \(z\) are allowed.
- Special case \(h=1\) equals the pinball loss, \(S(y, z) = (\mathbf{1}\{z \ge y\} - \alpha) (z - y)\).
- Special case \(h=1, \alpha=\frac{1}{2}\) equals half the absolute error \(S(y, z) = \frac{1}{2}|z - y|\).
\(h\) real valued: Only \(y>0\), \(z>0\) are allowed.

Special case \(h=0\) (by taking the limit) equals \(S(y, z) = |\mathbf{1}\{z \ge y\} - \alpha| \log\frac{z}{y}\).

References

[Gneiting2011]: T. Gneiting. "Making and Evaluating Point Forecasts". (2011) doi:10.1198/jasa.2011.r10138 arxiv:0912.0902

Examples:

>>> hqs = HomogeneousQuantileScore(degree=3, level=0.1)
>>> hqs(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
0.6083333333333334

Source code in src/model_diagnostics/scoring/scoring.py

class HomogeneousQuantileScore(_BaseScoringFunction):
    r"""Homogeneous scoring function of degree h for quantiles.

    The smaller the better, minimum is zero.

    Up to a multiplicative constant, these are the only scoring funtions that are
    strictly consistent for quantiles at level alpha and homogeneous functions.
    The possible additive constant is chosen such that the minimal function value
    equals zero.

    Note that the 1/2-quantile (level alpha=0.5) equals the median.

    Parameters
    ----------
    degree : float
        Degree of homogeneity.
    level : float
        The level of the quantile. (Often called \(\alpha\).)
        It must be `0 < level < 1`.
        `level=0.5` gives the median.

    Attributes
    ----------
    functional: str
        "quantile"

    Notes
    -----
    The homogeneous score of degree \(h\) is given by

    \[
    S_\alpha^h(y, z) = (\mathbf{1}\{z \ge y\} - \alpha) \frac{z^h - y^h}{h}
    \]

    There are important domain restrictions and limits:

    - \(h\) positive odd integer: All real numbers \(y\) and \(z\) are allowed.

        - Special case \(h=1\) equals the pinball loss,
          \(S(y, z) = (\mathbf{1}\{z \ge y\} - \alpha) (z - y)\).
        - Special case \(h=1, \alpha=\frac{1}{2}\) equals half the absolute error
          \(S(y, z) = \frac{1}{2}|z - y|\).

    - \(h\) real valued: Only \(y>0\), \(z>0\) are allowed.

        Special case \(h=0\) (by taking the limit) equals
        \(S(y, z) = |\mathbf{1}\{z \ge y\} - \alpha| \log\frac{z}{y}\).

    References
    ----------
    `[Gneiting2011]`

    :   T. Gneiting.
        "Making and Evaluating Point Forecasts". (2011)
        [doi:10.1198/jasa.2011.r10138](https://doi.org/10.1198/jasa.2011.r10138)
        [arxiv:0912.0902](https://arxiv.org/abs/0912.0902)

    Examples
    --------
    >>> hqs = HomogeneousQuantileScore(degree=3, level=0.1)
    >>> hqs(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
    0.6083333333333334
    """

    def __init__(self, degree: float = 2, level: float = 0.5) -> None:
        self.degree = degree
        if level <= 0 or level >= 1:
            msg = f"Argument level must fulfil 0 < level < 1, got {level}."
            raise ValueError(msg)
        self.level = level

    @property
    def functional(self):
        return "quantile"

    def score_per_obs(
        self,
        y_obs: npt.ArrayLike,
        y_pred: npt.ArrayLike,
    ) -> np.ndarray:
        """Score per observation.

        Parameters
        ----------
        y_obs : array-like of shape (n_obs)
            Observed values of the response variable.
        y_pred : array-like of shape (n_obs)
            Predicted values of the `functional` of interest, e.g. the conditional
            expectation of the response, `E(Y|X)`.

        Returns
        -------
        score_per_obs : ndarray
            Values of the scoring function for each observation.
        """
        y: np.ndarray
        z: np.ndarray
        y, z = validate_2_arrays(y_obs, y_pred)

        if self.degree == 1:
            # Fast path
            score = z - y
        elif self.degree > 1 and self.degree % 2 == 1:
            # Odd positive degree
            score = (np.power(z, self.degree) - np.power(y, self.degree)) / self.degree
        elif self.degree == 0:
            # Domain: y > 0 and z > 0.
            if not np.all((y > 0) & (z > 0)):
                msg = (
                    f"Valid domain for degree={self.degree} is "
                    "y_obs > 0 and y_pred > 0."
                )
                raise ValueError(msg)
            score = np.log(z / y)
        else:
            # Domain: y > 0 and z > 0.
            if not np.all((y > 0) & (z > 0)):
                msg = (
                    f"Valid domain for degree={self.degree} is "
                    "y_obs > 0 and y_pred > 0."
                )
                raise ValueError(msg)
            score = (np.power(z, self.degree) - np.power(y, self.degree)) / self.degree

        if self.level == 0.5:
            return 0.5 * np.abs(score)
        else:
            return (np.greater_equal(z, y) - self.level) * score

`score_per_obs(y_obs, y_pred)` ¶

Score per observation.

Parameters:

Name	Type	Description	Default
`y_obs`	`array-like of shape (n_obs)`	Observed values of the response variable.	required
`y_pred`	`array-like of shape (n_obs)`	Predicted values of the `functional` of interest, e.g. the conditional expectation of the response, `E(Y\|X)`.	required

Returns:

Name	Type	Description
`score_per_obs`	`ndarray`	Values of the scoring function for each observation.

Source code in src/model_diagnostics/scoring/scoring.py

def score_per_obs(
    self,
    y_obs: npt.ArrayLike,
    y_pred: npt.ArrayLike,
) -> np.ndarray:
    """Score per observation.

    Parameters
    ----------
    y_obs : array-like of shape (n_obs)
        Observed values of the response variable.
    y_pred : array-like of shape (n_obs)
        Predicted values of the `functional` of interest, e.g. the conditional
        expectation of the response, `E(Y|X)`.

    Returns
    -------
    score_per_obs : ndarray
        Values of the scoring function for each observation.
    """
    y: np.ndarray
    z: np.ndarray
    y, z = validate_2_arrays(y_obs, y_pred)

    if self.degree == 1:
        # Fast path
        score = z - y
    elif self.degree > 1 and self.degree % 2 == 1:
        # Odd positive degree
        score = (np.power(z, self.degree) - np.power(y, self.degree)) / self.degree
    elif self.degree == 0:
        # Domain: y > 0 and z > 0.
        if not np.all((y > 0) & (z > 0)):
            msg = (
                f"Valid domain for degree={self.degree} is "
                "y_obs > 0 and y_pred > 0."
            )
            raise ValueError(msg)
        score = np.log(z / y)
    else:
        # Domain: y > 0 and z > 0.
        if not np.all((y > 0) & (z > 0)):
            msg = (
                f"Valid domain for degree={self.degree} is "
                "y_obs > 0 and y_pred > 0."
            )
            raise ValueError(msg)
        score = (np.power(z, self.degree) - np.power(y, self.degree)) / self.degree

    if self.level == 0.5:
        return 0.5 * np.abs(score)
    else:
        return (np.greater_equal(z, y) - self.level) * score

`LogLoss` ¶

Log loss.

The smaller the better, minimum is zero.

The log loss is a strictly consistent scoring function for the mean for observations and predictions in the range 0 to 1. It is also referred to as (half the) Bernoulli deviance, (half the) Binomial log-likelihood, logistic loss and binary cross-entropy. Its minimal function value is zero.

Attributes:

Name	Type	Description
`functional`	`str`	"mean"

Notes

The log loss for \(y,z \in [0,1]\) is given by

\[ S(y, z) = - y \log\frac{z}{y} - (1 - y) \log\frac{1-z}{1-y} \]

If one restricts to \(y\in \{0, 1\}\), this simplifies to

\[ S(y, z) = - y \log(z) - (1 - y) \log(1-z) \]

Examples:

>>> ll = LogLoss()
>>> ll(y_obs=[0, 0.5, 1, 1], y_pred=[0.1, 0.2, 0.8 , 0.9], weights=[1, 2, 1, 1])
0.17603033705165635

Source code in src/model_diagnostics/scoring/scoring.py

class LogLoss(_BaseScoringFunction):
    r"""Log loss.

    The smaller the better, minimum is zero.

    The log loss is a strictly consistent scoring function for the mean for
    observations and predictions in the range 0 to 1.
    It is also referred to as (half the) Bernoulli deviance,
    (half the) Binomial log-likelihood, logistic loss and binary cross-entropy.
    Its minimal function value is zero.

    Attributes
    ----------
    functional: str
        "mean"

    Notes
    -----
    The log loss for \(y,z \in [0,1]\) is given by

    \[
    S(y, z) = - y \log\frac{z}{y} - (1 - y) \log\frac{1-z}{1-y}
    \]

    If one restricts to \(y\in \{0, 1\}\), this simplifies to

    \[
    S(y, z) = - y \log(z) - (1 - y) \log(1-z)
    \]

    Examples
    --------
    >>> ll = LogLoss()
    >>> ll(y_obs=[0, 0.5, 1, 1], y_pred=[0.1, 0.2, 0.8 , 0.9], weights=[1, 2, 1, 1])
    0.17603033705165635
    """

    @property
    def functional(self):
        return "mean"

    def score_per_obs(
        self,
        y_obs: npt.ArrayLike,
        y_pred: npt.ArrayLike,
    ) -> np.ndarray:
        """Score per observation.

        Parameters
        ----------
        y_obs : array-like of shape (n_obs)
            Observed values of the response variable.
        y_pred : array-like of shape (n_obs)
            Predicted values of the `functional` of interest, e.g. the conditional
            expectation of the response, `E(Y|X)`.

        Returns
        -------
        score_per_obs : ndarray
            Values of the scoring function for each observation.
        """
        y: np.ndarray
        z: np.ndarray
        y, z = validate_2_arrays(y_obs, y_pred)

        score = -special.xlogy(y, z) - special.xlogy(1 - y, 1 - z)
        if np.any((y > 0) & (y < 1)):
            score += special.xlogy(y, y) + special.xlogy(1 - y, 1 - y)
        return score

`score_per_obs(y_obs, y_pred)` ¶

Score per observation.

Parameters:

Name	Type	Description	Default
`y_obs`	`array-like of shape (n_obs)`	Observed values of the response variable.	required
`y_pred`	`array-like of shape (n_obs)`	Predicted values of the `functional` of interest, e.g. the conditional expectation of the response, `E(Y\|X)`.	required

Returns:

Name	Type	Description
`score_per_obs`	`ndarray`	Values of the scoring function for each observation.

Source code in src/model_diagnostics/scoring/scoring.py

def score_per_obs(
    self,
    y_obs: npt.ArrayLike,
    y_pred: npt.ArrayLike,
) -> np.ndarray:
    """Score per observation.

    Parameters
    ----------
    y_obs : array-like of shape (n_obs)
        Observed values of the response variable.
    y_pred : array-like of shape (n_obs)
        Predicted values of the `functional` of interest, e.g. the conditional
        expectation of the response, `E(Y|X)`.

    Returns
    -------
    score_per_obs : ndarray
        Values of the scoring function for each observation.
    """
    y: np.ndarray
    z: np.ndarray
    y, z = validate_2_arrays(y_obs, y_pred)

    score = -special.xlogy(y, z) - special.xlogy(1 - y, 1 - z)
    if np.any((y > 0) & (y < 1)):
        score += special.xlogy(y, y) + special.xlogy(1 - y, 1 - y)
    return score

`PinballLoss` ¶

Pinball loss.

The smaller the better, minimum is zero.

The pinball loss is strictly consistent for quantiles.

Parameters:

Name	Type	Description	Default
`level`	`float`	The level of the quantile. (Often called \(\alpha\).) It must be `0 < level < 1`. `level=0.5` gives the median.	`0.5`

Attributes:

Name	Type	Description
`functional`	`str`	"quantile"

Notes

The pinball loss has degree of homogeneity 1 and is given by

\[ S_\alpha(y, z) = (\mathbf{1}\{z \ge y\} - \alpha) (z - y) \]

The authors do not know where and when the term pinball loss was coined. It is most famously used in quantile regression.

Examples:

>>> pl = PinballLoss(level=0.9)
>>> pl(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
0.275

Source code in src/model_diagnostics/scoring/scoring.py

class PinballLoss(HomogeneousQuantileScore):
    r"""Pinball loss.

    The smaller the better, minimum is zero.

    The pinball loss is strictly consistent for quantiles.

    Parameters
    ----------
    level : float
        The level of the quantile. (Often called \(\alpha\).)
        It must be `0 < level < 1`.
        `level=0.5` gives the median.

    Attributes
    ----------
    functional: str
        "quantile"

    Notes
    -----
    The pinball loss has degree of homogeneity 1 and is given by

    \[
    S_\alpha(y, z) = (\mathbf{1}\{z \ge y\} - \alpha) (z - y)
    \]

    The authors do not know where and when the term *pinball loss* was coined. It is
    most famously used in quantile regression.

    Examples
    --------
    >>> pl = PinballLoss(level=0.9)
    >>> pl(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
    0.275
    """

    def __init__(self, level: float = 0.5) -> None:
        super().__init__(degree=1, level=level)

`PoissonDeviance` ¶

Poisson deviance.

The smaller the better, minimum is zero.

The Poisson deviance is strictly consistent for the mean. It has a degree of homogeneity of 1.

Attributes:

Name	Type	Description
`functional`	`str`	"mean"

Notes

\(S(y, z) = 2(y\log\frac{y}{z} - y + z)\)

Examples:

>>> pd = PoissonDeviance()
>>> pd(y_obs=[0, 0, 1, 1], y_pred=[2, 1, 1 , 2])
1.6534264097200273

Source code in src/model_diagnostics/scoring/scoring.py

class PoissonDeviance(HomogeneousExpectileScore):
    r"""Poisson deviance.

    The smaller the better, minimum is zero.

    The Poisson deviance is strictly consistent for the mean.
    It has a degree of homogeneity of 1.

    Attributes
    ----------
    functional: str
        "mean"

    Notes
    -----
    \(S(y, z) = 2(y\log\frac{y}{z} - y + z)\)

    Examples
    --------
    >>> pd = PoissonDeviance()
    >>> pd(y_obs=[0, 0, 1, 1], y_pred=[2, 1, 1 , 2])
    1.6534264097200273
    """

    def __init__(self) -> None:
        super().__init__(degree=1, level=0.5)

`SquaredError` ¶

Squared error.

The smaller the better, minimum is zero.

The squared error is strictly consistent for the mean. It has a degree of homogeneity of 2. In the context of probabilistic classification, it is also known as Brier score.

Attributes:

Name	Type	Description
`functional`	`str`	"mean"

Notes

\(S(y, z) = (y - z)^2\)

Examples:

>>> se = SquaredError()
>>> se(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
0.75

Source code in src/model_diagnostics/scoring/scoring.py

class SquaredError(HomogeneousExpectileScore):
    r"""Squared error.

    The smaller the better, minimum is zero.

    The squared error is strictly consistent for the mean.
    It has a degree of homogeneity of 2.
    In the context of probabilistic classification, it is also known as Brier score.


    Attributes
    ----------
    functional: str
        "mean"

    Notes
    -----
    \(S(y, z) = (y - z)^2\)

    Examples
    --------
    >>> se = SquaredError()
    >>> se(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1 , 2])
    0.75
    """

    def __init__(self) -> None:
        super().__init__(degree=2, level=0.5)

`decompose(y_obs, y_pred, weights=None, *, scoring_function, functional=None, level=None)` ¶

Additive decomposition of scores.

The score is decomposed as score = miscalibration - discrimination + uncertainty.

Parameters:

Name	Type	Description	Default
`y_obs`	`array-like of shape (n_obs)`	Observed values of the response variable.	required
`y_pred`	`array-like of shape (n_obs) or (n_obs, n_models)`	Predicted values of the `functional` of interest, e.g. the conditional expectation of the response, `E(Y\|X)`.	required
`weights`	`array-like of shape (n_obs) or None`	Case weights.	`None`
`scoring_function`	`callable`	A scoring function with signature roughly `fun(y_obs, y_pred, weights) -> float`.	required
`functional`	`str or None`	The target functional which `y_pred` aims to predict. If `None`, then it will be inferred from `scoring_function.functional`. Options are: `"mean"`. Argument `level` is neglected. `"median"`. Argument `level` is neglected. `"expectile"` `"quantile"`	`None`
`level`	`float or None`	Functionals like expectiles and quantiles have a level (often called alpha). If `None`, then it will be inferred from `scoring_function.level`.	`None`

Returns:

Name	Type	Description
`decomposition`	`DataFrame`	The resulting score decomposition as a dataframe with columns: `miscalibration` `discrimination` `uncertainty` `score`: the average score
	If `y_pred` contains several predictions, i.e. it is 2-dimension with shape
	`(n_obs, n_pred)` and `n_pred >1`, then there is the additional column:	`model`

Notes

To be precise, this function returns the decomposition of the score in terms of auto-miscalibration, auto-discrimination (or resolution) and uncertainy (or entropy), see [FLM2022] and references therein. The key element is to estimate the recalibrated predictions, i.e. \(T(Y|m(X))\) for the target functional \(T\) and model predictions \(m(X)\). This is accomplished by isotonic regression, [Dimitriadis2021] and [Gneiting2021].

References

[FLM2022]: T. Fissler, C. Lorentzen, and M. Mayer. "Model Comparison and Calibration Assessment". (2022) arxiv:2202.12780.
[Dimitriadis2021]: T. Dimitriadis, T. Gneiting, and A. I. Jordan. "Stable reliability diagrams for probabilistic classifiers". (2021) doi:10.1073/pnas.2016191118
[Gneiting2021]: T. Gneiting and J. Resin. "Regression Diagnostics meets Forecast Evaluation: Conditional Calibration, Reliability Diagrams, and Coefficient of Determination". (2021). arXiv:2108.03210.

Examples:

>>> decompose(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1, 2],
... scoring_function=SquaredError())
shape: (1, 4)
┌────────────────┬────────────────┬─────────────┬───────┐
│ miscalibration ┆ discrimination ┆ uncertainty ┆ score │
│ ---            ┆ ---            ┆ ---         ┆ ---   │
│ f64            ┆ f64            ┆ f64         ┆ f64   │
╞════════════════╪════════════════╪═════════════╪═══════╡
│ 0.625          ┆ 0.125          ┆ 0.25        ┆ 0.75  │
└────────────────┴────────────────┴─────────────┴───────┘

Source code in src/model_diagnostics/scoring/scoring.py

def decompose(
    y_obs: npt.ArrayLike,
    y_pred: npt.ArrayLike,
    weights: Optional[npt.ArrayLike] = None,
    *,
    scoring_function: Callable[..., Any],  # TODO: make type hint stricter
    functional: Optional[str] = None,
    level: Optional[float] = None,
) -> pl.DataFrame:
    r"""Additive decomposition of scores.

    The score is decomposed as
    `score = miscalibration - discrimination + uncertainty`.

    Parameters
    ----------
    y_obs : array-like of shape (n_obs)
        Observed values of the response variable.
    y_pred : array-like of shape (n_obs) or (n_obs, n_models)
        Predicted values of the `functional` of interest, e.g. the conditional
        expectation of the response, `E(Y|X)`.
    weights : array-like of shape (n_obs) or None
        Case weights.
    scoring_function : callable
        A scoring function with signature roughly
        `fun(y_obs, y_pred, weights) -> float`.
    functional : str or None
        The target functional which `y_pred` aims to predict.
        If `None`, then it will be inferred from `scoring_function.functional`.
        Options are:

        - `"mean"`. Argument `level` is neglected.
        - `"median"`. Argument `level` is neglected.
        - `"expectile"`
        - `"quantile"`
    level : float or None
        Functionals like expectiles and quantiles have a level (often called alpha).
        If `None`, then it will be inferred from `scoring_function.level`.

    Returns
    -------
    decomposition : polars.DataFrame
        The resulting score decomposition as a dataframe with columns:

        - `miscalibration`
        - `discrimination`
        - `uncertainty`
        - `score`: the average score

    If `y_pred` contains several predictions, i.e. it is 2-dimension with shape
    `(n_obs, n_pred)` and `n_pred >1`, then there is the additional column:

        - `model`

    Notes
    -----
    To be precise, this function returns the decomposition of the score in terms of
    auto-miscalibration, auto-discrimination (or resolution) and uncertainy (or
    entropy), see `[FLM2022]` and references therein.
    The key element is to estimate the recalibrated predictions, i.e. \(T(Y|m(X))\) for
    the target functional \(T\) and model predictions \(m(X)\).
    This is accomplished by isotonic regression, `[Dimitriadis2021]` and
    `[Gneiting2021]`.

    References
    ----------
    `[FLM2022]`

    :   T. Fissler, C. Lorentzen, and M. Mayer.
        "Model Comparison and Calibration Assessment". (2022)
        [arxiv:2202.12780](https://arxiv.org/abs/2202.12780).

    `[Dimitriadis2021]`

    :   T. Dimitriadis, T. Gneiting, and A. I. Jordan.
        "Stable reliability diagrams for probabilistic classifiers". (2021)
        [doi:10.1073/pnas.2016191118](https://doi.org/10.1073/pnas.2016191118)

    `[Gneiting2021]`

    :   T. Gneiting and J. Resin.
        "Regression Diagnostics meets Forecast Evaluation: Conditional Calibration,
        Reliability Diagrams, and Coefficient of Determination". (2021).
        [arXiv:2108.03210](https://arxiv.org/abs/2108.03210).

    Examples
    --------
    >>> decompose(y_obs=[0, 0, 1, 1], y_pred=[-1, 1, 1, 2],
    ... scoring_function=SquaredError())
    shape: (1, 4)
    ┌────────────────┬────────────────┬─────────────┬───────┐
    │ miscalibration ┆ discrimination ┆ uncertainty ┆ score │
    │ ---            ┆ ---            ┆ ---         ┆ ---   │
    │ f64            ┆ f64            ┆ f64         ┆ f64   │
    ╞════════════════╪════════════════╪═════════════╪═══════╡
    │ 0.625          ┆ 0.125          ┆ 0.25        ┆ 0.75  │
    └────────────────┴────────────────┴─────────────┴───────┘
    """
    if functional is None:
        if hasattr(scoring_function, "functional"):
            functional = scoring_function.functional
        else:
            msg = (
                "You set functional=None, but scoring_function has no attribute "
                "functional."
            )
            raise ValueError(msg)
    if level is None:
        level = 0.5
        if functional in ("expectile", "quantile"):
            if hasattr(scoring_function, "level"):
                level = float(scoring_function.level)
            else:
                msg = (
                    "You set level=None, but scoring_function has no attribute "
                    "level."
                )
                raise ValueError(msg)

    allowed_functionals = ("mean", "median", "expectile", "quantile")
    if functional not in allowed_functionals:
        msg = (
            f"The functional must be one of {allowed_functionals}, got "
            f"{functional}."
        )
        raise ValueError(msg)
    if functional in ("expectile", "quantile") and (level <= 0 or level >= 1):
        msg = f"The level must fulfil 0 < level < 1, got {level}."
        raise ValueError(msg)

    validate_same_first_dimension(y_obs, y_pred)
    n_pred = length_of_second_dimension(y_pred)
    pred_names, _ = get_sorted_array_names(y_pred)
    y_o = np.asarray(y_obs)

    if weights is None:
        w = None
    else:
        validate_same_first_dimension(weights, y_o)
        w = np.asarray(weights)  # needed to satisfy mypy
        if w.ndim > 1:
            msg = f"The array weights must be 1-dimensional, got weights.ndim={w.ndim}."
            raise ValueError(msg)

    if functional == "mean":
        iso = IsotonicRegression_skl(y_min=None, y_max=None)
        marginal = np.average(y_o, weights=w)
    else:
        iso = IsotonicRegression(functional=functional, level=level)
        if functional == "expectile":
            marginal = expectile(y_o, alpha=level, weights=w)
        elif functional == "quantile":
            marginal = 0.5 * (
                quantile_lower(y_o, level=level) + quantile_upper(y_o, level=level)
            )

    if y_o[0] == marginal == y_o[-1]:
        # y_o is constant. We need to check if y_o is allowed as argument to y_pred.
        # For instance for the poisson deviance, y_o = 0 is allowed. But 0 is forbidden
        # as a prediction.
        try:
            scoring_function(y_o[0], marginal)
        except ValueError as exc:
            msg = (
                "Your y_obs is constant and lies outside the allowed range of y_pred "
                "of your scoring function. Therefore, the score decomposition cannot "
                "be applied."
            )
            raise ValueError(msg) from exc

    # The recalibrated versions, further down, could contain min(y_obs) and that could
    # be outside of the valid domain, e.g. y_pred = 0 for the Poisson deviance where
    # y_obs=0 is allowed. We detect that here:
    y_min = np.amin(y_o)
    y_min_allowed = True
    try:
        scoring_function(y_o[:1], np.array([y_min]), None if w is None else w[:1])
    except ValueError:
        y_min_allowed = False

    marginal = np.full_like(y_o, fill_value=marginal, dtype=float)
    score_marginal = scoring_function(y_o, marginal, w)

    df_list = []
    for i in range(len(pred_names)):
        # Loop over columns of y_pred.
        x = y_pred if n_pred == 0 else get_second_dimension(y_pred, i)
        iso.fit(x, y_o, sample_weight=w)
        recalibrated = np.squeeze(iso.predict(x))
        if not y_min_allowed and recalibrated[0] <= y_min:
            # Oh dear, this needs quite some extra work:
            # First index of value greater than y_min
            idx1 = np.argmax(recalibrated > y_min)
            val1 = recalibrated[idx1]
            # First index of value greater than the value at idx1.
            idx2 = np.argmax(recalibrated > val1)
            # Note that val1 may already be the largest value of the array => idx2 = 0.
            if idx2 == 0:
                idx2 = recalibrated.shape[0]
            # We merge the first 2 blocks of the isotonic regression as it violates
            # our domain requirements.
            re2 = recalibrated[:idx2]
            w2 = None if w is None else w[:idx2]
            if functional == "mean":
                recalibrated[:idx2] = np.average(re2, weights=w2)
            elif functional == "expectile":
                recalibrated[:idx2] = expectile(re2, alpha=level, weights=w2)
            elif functional == "quantile":
                # Note, no scoring function known that could end up here.
                lower = quantile_lower(re2, level=level)
                upper = quantile_upper(re2, level=level)
                recalibrated[:idx2] = 0.5 * (lower + upper)

        score = scoring_function(y_o, x, w)
        try:
            score_recalibrated = scoring_function(y_o, recalibrated, w)
        except ValueError as exc:
            msg = (
                "The recalibrated predictions obtained from isotonic regression are "
                "very likely outside the allowed range of y_pred of your scoring "
                "function. Therefore, the score decomposition cannot be applied."
            )
            raise ValueError(msg) from exc

        df = pl.DataFrame(
            {
                "model": pred_names[i],
                "miscalibration": score - score_recalibrated,
                "discrimination": score_marginal - score_recalibrated,
                "uncertainty": score_marginal,
                "score": score,
            }
        )
        df_list.append(df)

    df = pl.concat(df_list)

    # Remove column "model" for a single model.
    if n_pred <= 1:
        df = df.drop("model")

    return df

`plot_murphy_diagram(y_obs, y_pred, weights=None, *, etas=100, functional='mean', level=0.5, ax=None)` ¶

Plot a Murphy diagram.

A Murphy diagram plots the scores of elementary scoring functions ElementaryScore over a range of their free parameter eta. This shows, if a model dominates all others over a wide class of scoring functions or if the ranking is very much dependent on the choice of scoring function. See Notes for further details.

Parameters:

Name	Type	Description	Default
`y_obs`	`array-like of shape (n_obs)`	Observed values of the response variable. For binary classification, y_obs is expected to be in the interval [0, 1].	required
`y_pred`	`array-like of shape (n_obs) or (n_obs, n_models)`	Predicted values of the conditional expectation of Y, `E(Y\|X)`.	required
`weights`	`array-like of shape (n_obs) or None`	Case weights.	`None`
`etas`	`int or array - like`	If an integer is given, equidistant points between min and max y values are generater. If an array-like is given, those points are used.	`100`
`functional`	`str`	The functional that is induced by the identification function `V`. Options are: `"mean"`. Argument `level` is neglected. `"median"`. Argument `level` is neglected. `"expectile"` `"quantile"`	`'mean'`
`level`	`float`	The level of the expectile of quantile. (Often called \(\alpha\).) It must be `0 < level < 1`. `level=0.5` and `functional="expectile"` gives the mean. `level=0.5` and `functional="quantile"` gives the median.	`0.5`
`ax`	`Axes`	Axes object to draw the plot onto, otherwise uses the current Axes.	`None`

Returns:

Name	Type	Description
`ax`		Either the matplotlib axes or the plotly figure. This is configurable by setting the `plot_backend` via `model_diagnostics.set_config` or `model_diagnostics.config_context`.

Notes

For details, refer to [Ehm2015].

References

[Ehm2015]: W. Ehm, T. Gneiting, A. Jordan, F. Krüger. "Of Quantiles and Expectiles: Consistent Scoring Functions, Choquet Representations, and Forecast Rankings". arxiv:1503.08195.

Source code in src/model_diagnostics/scoring/plots.py

def plot_murphy_diagram(
    y_obs: npt.ArrayLike,
    y_pred: npt.ArrayLike,
    weights: Optional[npt.ArrayLike] = None,
    *,
    etas: Union[int, npt.ArrayLike] = 100,
    functional: str = "mean",
    level: float = 0.5,
    ax: Optional[mpl.axes.Axes] = None,
):
    r"""Plot a Murphy diagram.

    A Murphy diagram plots the scores of elementary scoring functions `ElementaryScore`
    over a range of their free parameter `eta`. This shows, if a model dominates all
    others over a wide class of scoring functions or if the ranking is very much
    dependent on the choice of scoring function.
    See [Notes](#notes) for further details.

    Parameters
    ----------
    y_obs : array-like of shape (n_obs)
        Observed values of the response variable.
        For binary classification, y_obs is expected to be in the interval [0, 1].
    y_pred : array-like of shape (n_obs) or (n_obs, n_models)
        Predicted values of the conditional expectation of Y, `E(Y|X)`.
    weights : array-like of shape (n_obs) or None
        Case weights.
    etas : int or array-like
        If an integer is given, equidistant points between min and max y values are
        generater. If an array-like is given, those points are used.
    functional : str
        The functional that is induced by the identification function `V`. Options are:

        - `"mean"`. Argument `level` is neglected.
        - `"median"`. Argument `level` is neglected.
        - `"expectile"`
        - `"quantile"`
    level : float
        The level of the expectile of quantile. (Often called \(\alpha\).)
        It must be `0 < level < 1`.
        `level=0.5` and `functional="expectile"` gives the mean.
        `level=0.5` and `functional="quantile"` gives the median.
    ax : matplotlib.axes.Axes
        Axes object to draw the plot onto, otherwise uses the current Axes.

    Returns
    -------
    ax :
        Either the matplotlib axes or the plotly figure. This is configurable by
        setting the `plot_backend` via
        [`model_diagnostics.set_config`][model_diagnostics.set_config] or
        [`model_diagnostics.config_context`][model_diagnostics.config_context].

    Notes
    -----
    For details, refer to `[Ehm2015]`.

    References
    ----------
    `[Ehm2015]`

    :   W. Ehm, T. Gneiting, A. Jordan, F. Krüger.
        "Of Quantiles and Expectiles: Consistent Scoring Functions, Choquet
        Representations, and Forecast Rankings".
        [arxiv:1503.08195](https://arxiv.org/abs/1503.08195).
    """
    if ax is None:
        plot_backend = get_config()["plot_backend"]
        if plot_backend == "matplotlib":
            ax = plt.gca()
        else:
            import plotly.graph_objects as go

            fig = ax = go.Figure()
    elif isinstance(ax, mpl.axes.Axes):
        plot_backend = "matplotlib"
    elif is_plotly_figure(ax):
        import plotly.graph_objects as go

        plot_backend = "plotly"
        fig = ax
    else:
        msg = (
            "The ax argument must be None, a matplotlib Axes or a plotly Figure, "
            f"got {type(ax)}."
        )
        raise ValueError(msg)

    if (n_cols := length_of_second_dimension(y_obs)) > 0:
        if n_cols == 1:
            y_obs = get_second_dimension(y_obs, 0)
        else:
            msg = (
                f"Array-like y_obs has more than 2 dimensions, y_obs.shape[1]={n_cols}"
            )
            raise ValueError(msg)

    y_pred_min, y_pred_max = get_array_min_max(y_pred)
    y_obs_min, y_obs_max = get_array_min_max(y_obs)
    y_min, y_max = min(y_pred_min, y_obs_min), max(y_pred_max, y_obs_max)

    if y_min == y_max:
        msg = "All values y_obs and y_pred are one single and same value."
        raise ValueError(msg)
    elif isinstance(etas, numbers.Integral):
        etas = np.linspace(y_min, y_max, num=etas, endpoint=True)
    else:
        etas = np.asarray(etas).astype(float)
        if etas.ndim > 1:
            etas = etas.reshape(max(etas.shape))

    def elementary_score(y_obs, y_pred, weights, eta):
        sf = ElementaryScore(eta, functional=functional, level=level)
        return sf(y_obs=y_obs, y_pred=y_pred, weights=weights)

    n_pred = length_of_second_dimension(y_pred)
    pred_names, _ = get_sorted_array_names(y_pred)

    for i in range(len(pred_names)):
        y_pred_i = y_pred if n_pred == 0 else get_second_dimension(y_pred, i)

        y_plot = [
            elementary_score(y_obs=y_obs, y_pred=y_pred_i, weights=weights, eta=eta)
            for eta in etas
        ]
        label = pred_names[i] if n_pred >= 2 else None
        if plot_backend == "matplotlib":
            ax.plot(etas, y_plot, label=label)
        else:
            fig.add_scatter(
                x=etas,
                y=y_plot,
                mode="lines",
                line={"color": get_plotly_color(i)},
                name=label,
            )

    xlabel = "eta"
    ylabel = "score"
    title = "Murphy Diagram"
    if n_pred <= 1 and len(pred_names[0]) > 0:
        title = title + " " + pred_names[0]

    if plot_backend == "matplotlib":
        if n_pred >= 2:
            ax.legend()
        ax.set_title(title)
        ax.set(xlabel=xlabel, ylabel=ylabel)
    else:
        if n_pred <= 1:
            fig.update_layout(showlegend=False)
        fig.update_layout(xaxis_title=xlabel, yaxis_title=ylabel, title=title)

    return ax

scoring

scoring ¶

ElementaryScore ¶

score_per_obs(y_obs, y_pred) ¶

GammaDeviance ¶

HomogeneousExpectileScore ¶

score_per_obs(y_obs, y_pred) ¶

HomogeneousQuantileScore ¶

score_per_obs(y_obs, y_pred) ¶

LogLoss ¶

score_per_obs(y_obs, y_pred) ¶

PinballLoss ¶

PoissonDeviance ¶

SquaredError ¶

decompose(y_obs, y_pred, weights=None, *, scoring_function, functional=None, level=None) ¶

plot_murphy_diagram(y_obs, y_pred, weights=None, *, etas=100, functional='mean', level=0.5, ax=None) ¶

`scoring` ¶

`ElementaryScore` ¶

`score_per_obs(y_obs, y_pred)` ¶

`GammaDeviance` ¶

`HomogeneousExpectileScore` ¶

`score_per_obs(y_obs, y_pred)` ¶

`HomogeneousQuantileScore` ¶

`score_per_obs(y_obs, y_pred)` ¶

`LogLoss` ¶

`score_per_obs(y_obs, y_pred)` ¶

`PinballLoss` ¶

`PoissonDeviance` ¶

`SquaredError` ¶

`decompose(y_obs, y_pred, weights=None, *, scoring_function, functional=None, level=None)` ¶

`plot_murphy_diagram(y_obs, y_pred, weights=None, *, etas=100, functional='mean', level=0.5, ax=None)` ¶