lactationcurve.characteristics.ISLC

Interpolation using Standard Lactation Curves (ISLC).

Purpose

The Interpolation using Standard Lactation Curves (ISLC) method estimates lactation yield from intermittent test-day records by predicting deviations from an expected lactation curve. The method uses standard lactation curves that represent the expected course of lactation for specific cow subgroups, such as breed or parity.

By incorporating these standard curves, ISLC accounts for the typical lactation pattern in which milk yield increases after calving, reaches a peak, and subsequently declines. Daily yields are estimated at fixed lactation days (e.g., 0, 10, 30, 50) and used to calculate accumulated lactation production.

This module implements the ISLC family of methods following the CRV/ICAR approach described in ICAR Procedure 2, Section 2 (Computing Accumulated Lactation Yield). The implementation is based on Wilmink-type standard lactation curves combined with correlation-based deviation corrections.

Method Summary

ISLC methods estimate total lactation yield by first interpolating test-day yields onto a predefined DIM grid, then integrating the resulting days with a test-interval style summation.

Key Entry Points

Column Flexibility

The functions accept several case-insensitive column name aliases and can create a default TestId if one is missing. Recognized aliases:

  • Days in Milk: ["daysinmilk", "dim", "testday"]
  • Milk Yield: ["milkingyield", "testdaymilkyield", "milkyield", "yield"]
  • Test Id: ["animalid", "testid", "id"]

It is also possible to provide your own column names so the function can be applied to dataframes with different column naming conventions.

Returns a DataFrame with columns: ["TestId", "LactationMilkYield"].

Defaults

  • STANDARD_CURVE: Baseline expected lactation curve for days 1..305 (Wilmink lactation curve model).
  • CORR_MATRIX: Default day-to-day correlation structure used for projection.
  • STDs: Standard deviations for each grid day, empirically estimated.

Notes

  • Milk samples are often recorded on non-grid DIM values. Interpolation is therefore used to map measurements to grid days.
  • Three interpolation variants are implemented. interpolation_standard_lc is recommended because it is the most consistent with the original method.
  • interpolation_standard_lc requires a standard lactation curve. You can fit one with the lactationcurve package or use the default lactation curve.
  • The methods can be applied to lactations without any measurements, in which case the result will be the population mean from the standard curve.
  • Predicted milk yields have less variance than true milk yields. With TIM, estimated yields have more variance than true yields. The reason is that predicted yields are regressed toward the mean unless all 305 daily yields are observed.
  • The main ISLC method in this package is not the same as the original method described in the paper by Wilmink. The original method is implemented in ISLC_original_method and ISLC_original. The difference in the new method is that the grid is extended to include day 1 and day 305. ALso the measured test days are taken into account in the final summation to lactation yield, while in the original method only the interpolated values on the grid are used. The new method can also be applied to lactation extending beyond 305 days, while the original method is strictly for 305-day yield estimation.
  • The method currently assumes that the standard curves are the same for all lactations, but future updates may allow using different standard curves and covariance structures for different subgroups of lactations (e.g., by breed or parity).
  • ISLC's strenght is that its caclulation is broken up in multiple steps, which makes it more computer memory friendly than the best predict method, which requires the inversion of a large covariance matrix. Because of the integration of standard lactation curves it takes into account the shape of the lactation curve, which can lead to more accurate estimates of total yield, especially for lactations with few test days or irregular patterns.
  • ISLC's main disadvantage is that because of the many in between steps, it is relatively complex to implement and understand, making it less transparant. The best results are obtained when the standard curve and covariance matrix are from the same population as the data, which can be a barrier for users without access to a large reference dataset. This also causes inconsistencies in cumulative milk yield results depending on which standard curves are used. A cow with the exact same test-day records can have a different cumulative milk yield estimates depending on the standard curve used, which can be considered unfair.

References

Wilmink, J. B. M. (1987). Comparison of different methods of predicting 305-day milk yield using means calculated from within-herd lactation curves. Livestock Production Science, 17, 1-17.

Wilmink, J. B. M. (1987). Adjustment of test-day milk, fat and protein yield for age, season and stage of lactation. Livestock Production Science, 16(4), 335-348.

Handbook NRS © CR Delta chapter E1 and E2 02-98

Author: Meike van Leerdam Date: 20 October 2025 Last update: 21 May 2026

   1"""Interpolation using Standard Lactation Curves (ISLC).
   2
   3Purpose
   4-------
   5The Interpolation using Standard Lactation Curves (ISLC) method estimates
   6lactation yield from intermittent test-day records by predicting
   7deviations from an expected lactation curve.
   8The method uses standard lactation curves that represent the expected
   9course of lactation for specific cow subgroups, such
  10as breed or parity.
  11
  12By incorporating these standard curves, ISLC accounts for the typical
  13lactation pattern in which milk yield increases after calving, reaches a
  14peak, and subsequently declines. Daily yields are estimated at fixed
  15lactation days (e.g., 0, 10, 30, 50) and used to calculate accumulated
  16lactation production.
  17
  18This module implements the ISLC family of methods following the CRV/ICAR
  19approach described in ICAR Procedure 2, Section 2 (Computing Accumulated
  20Lactation Yield). The implementation is based on Wilmink-type standard
  21lactation curves combined with correlation-based deviation corrections.
  22
  23
  24Method Summary
  25--------------
  26ISLC methods estimate total lactation yield by first interpolating test-day
  27yields onto a predefined DIM grid, then integrating the resulting days with
  28a test-interval style summation.
  29
  30Key Entry Points
  31----------------
  32- ``ISLC_method``: compute a single lactation's estimated 305-d yield from
  33    interpolated grid measurements.
  34- ``ISLC``: apply ``ISLC_method`` per ``TestId`` in a pandas DataFrame.
  35- ``ISLC_original_method``: compute a single lactation's estimated
  36    305-d yield using the technique described in the original paper.
  37    Wilmink et al. (1987)
  38    "Comparison of different methods of predicting 305-day milk yield
  39    using means calculated from within-herd lactation curves"
  40- ``ISLC_original``: apply ``ISLC_original_method`` per ``TestId``
  41    in a pandas DataFrame.
  42- ``interpolation_standard_lc``: interpolate measured test-day yields onto
  43    the DIM grid using the standard lactation curve to guide slopes.
  44- ``linear_interpd_all_to_grid`` and ``linear_interpd_closest_to_grid``:
  45  Linear interpolation alternatives.
  46
  47Column Flexibility
  48------------------
  49The functions accept several case-insensitive column name aliases and can
  50create a default ``TestId`` if one is missing. Recognized aliases:
  51
  52- Days in Milk: `["daysinmilk", "dim", "testday"]`
  53- Milk Yield: `["milkingyield", "testdaymilkyield", "milkyield", "yield"]`
  54- Test Id: `["animalid", "testid", "id"]`
  55
  56It is also possible to provide your own column names so the function
  57can be applied to dataframes with different column naming conventions.
  58
  59Returns a DataFrame with columns: ``["TestId", "LactationMilkYield"]``.
  60
  61Defaults
  62--------
  63- ``STANDARD_CURVE``: Baseline expected lactation curve for days 1..305
  64    (Wilmink lactation curve model).
  65- ``CORR_MATRIX``: Default day-to-day correlation structure used for projection.
  66- ``STDs``: Standard deviations for each grid day, empirically estimated.
  67
  68Notes
  69-----
  70- Milk samples are often recorded on non-grid DIM values.
  71    Interpolation is therefore used to map measurements to grid days.
  72- Three interpolation variants are implemented.
  73    ``interpolation_standard_lc`` is recommended because it is the most
  74    consistent with the original method.
  75- ``interpolation_standard_lc`` requires a standard lactation curve.
  76    You can fit one with the ``lactationcurve`` package
  77    or use the default lactation curve.
  78- The methods can be applied to lactations without any measurements,
  79    in which case the result will be the population mean from the standard curve.
  80- Predicted milk yields have less variance than true milk yields. With TIM,
  81    estimated yields have more variance than true yields. The reason is that predicted yields are
  82    regressed toward the mean unless all 305 daily yields are observed.
  83- The main ISLC method in this package is not the same as the original method
  84    described in the paper by Wilmink.
  85    The original method is implemented in ``ISLC_original_method`` and ``ISLC_original``.
  86    The difference in the new method is that the grid is extended to include day 1 and day 305.
  87    ALso the measured test days are taken into account in the final summation to lactation yield,
  88    while in the original method only the interpolated values on the grid are used.
  89    The new method can also be applied to lactation extending beyond 305 days,
  90    while the original method is strictly for 305-day yield estimation.
  91- The method currently assumes that the standard curves are the same
  92    for all lactations,
  93    but future updates may allow using different standard curves and covariance structures for
  94    different subgroups of lactations (e.g., by breed or parity).
  95- ISLC's strenght is that its caclulation is broken up in multiple steps,
  96    which makes it more computer memory friendly than the best predict method,
  97    which requires the inversion of a large covariance matrix.
  98    Because of the integration of standard lactation curves
  99    it takes into account the shape of the lactation curve,
 100    which can lead to more accurate estimates of total yield,
 101    especially for lactations with few test days or irregular patterns.
 102- ISLC's main disadvantage is that because of the many in between steps,
 103    it is relatively complex to implement and understand, making it less transparant.
 104    The best results are obtained when the standard curve and covariance matrix
 105    are from the same population as the data,
 106    which can be a barrier for users without access to a large reference dataset.
 107    This also causes inconsistencies in cumulative milk yield results
 108    depending on which standard curves are used.
 109    A cow with the exact same test-day records
 110    can have a different cumulative milk yield estimates depending
 111    on the standard curve used, which can be considered unfair.
 112
 113References
 114---------
 115Wilmink, J. B. M. (1987).
 116Comparison of different methods of predicting 305-day milk yield using means
 117calculated from within-herd lactation curves. Livestock Production Science,
 11817, 1-17.
 119
 120Wilmink, J. B. M. (1987).
 121Adjustment of test-day milk, fat and protein yield for age,
 122season and stage of lactation. Livestock Production Science, 16(4), 335-348.
 123
 124Handbook NRS © CR Delta chapter E1 and E2 02-98
 125
 126
 127Author: Meike van Leerdam
 128Date: 20 October 2025
 129Last update: 21 May 2026
 130"""
 131
 132# Import packages
 133import inspect
 134from pathlib import Path
 135from typing import Any, Protocol, cast
 136
 137import numpy as np
 138import numpy.typing as npt
 139import pandas as pd
 140from scipy.interpolate import interp1d
 141
 142from lactationcurve.fitting import fit_lactation_curve
 143from lactationcurve.preprocessing import standardize_lactation_columns
 144
 145DATA_DIR = Path(__file__).resolve().parent / "data"
 146GRID_DAYS = [10, 30, 50, 70, 90, 110, 130, 150, 170, 190, 210, 230, 250, 270, 290]
 147
 148# get the standard lactation curve ingredients back from the data storage:
 149CORR_MATRIX = cast(pd.DataFrame, pd.read_pickle(DATA_DIR / "corr_matrix.pkl"))
 150STDs = np.load(DATA_DIR / "std_per_grid_day.npy")
 151STANDARD_CURVE = np.load(DATA_DIR / "standard_lc_grid.npy")
 152
 153# add a day zero (that equals day 1) to the standard curve to prevent an index shift
 154STANDARD_CURVE = np.insert(STANDARD_CURVE, 0, STANDARD_CURVE[0])
 155
 156
 157class _DocDefault:
 158    """Lightweight repr-only wrapper for cleaner generated signatures."""
 159
 160    def __init__(self, label: str) -> None:
 161        self.label = label
 162
 163    def __repr__(self) -> str:
 164        return self.label
 165
 166
 167_DOC_STANDARD_CURVE = _DocDefault("STANDARD_CURVE")
 168_DOC_CORR_MATRIX = _DocDefault("CORR_MATRIX")
 169_DOC_STDS = _DocDefault("STDs")
 170
 171
 172def _curve_to_series(curve: pd.Series | npt.NDArray[np.float64]) -> pd.Series:
 173    """Normalize standard lactation curve input to a Series."""
 174    if isinstance(curve, pd.Series):
 175        return curve
 176    return pd.Series(curve)
 177
 178
 179def _curve_value(curve: pd.Series, day: int) -> float:
 180    """Return curve value for a DIM, supporting both label and positional indexing."""
 181    if day in curve.index:
 182        return float(curve.loc[day])
 183    return float(curve.iloc[day])
 184
 185
 186class InterpolationMethod(Protocol):
 187    """Protocol for milk-yield interpolation methods.
 188
 189    Defines the signature expected by functions that interpolate test-day
 190    yields onto a predefined DIM grid.
 191    """
 192
 193    def __call__(
 194        self,
 195        group: pd.DataFrame,
 196        days_in_milk_col: str,
 197        milking_yield_col: str,
 198        standard_lc: pd.Series | None = None,
 199        small_grid: bool = False,
 200    ) -> pd.DataFrame:
 201        """Interpolate yields for a single lactation onto the DIM grid.
 202
 203        Args:
 204            group: Test-day records for a single animal.
 205            days_in_milk_col: Column name containing days in milk (DIM).
 206            milking_yield_col: Column name containing milk yields.
 207            standard_lc: Optional standard lactation curve to guide
 208                interpolation (method-dependent).
 209            small_grid: If True, interpolate only on GRID_DAYS;
 210                otherwise use [0] + GRID_DAYS + [305].
 211
 212        Returns:
 213            A DataFrame with interpolated yields, typically including
 214            columns ``GridDay`` and ``MilkYieldInterp``.
 215        """
 216        ...
 217
 218
 219def ISLC(
 220    df: pd.DataFrame,
 221    days_in_milk_col: str | None = None,
 222    milking_yield_col: str | None = None,
 223    test_id_col: str | None = None,
 224    default_test_id: int = 0,
 225    standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE,
 226    correlation_matrix: pd.DataFrame = CORR_MATRIX,
 227    std_per_grid_day: npt.NDArray[np.float64] = STDs,
 228    max_dim: int | str = 305,
 229    fit_standard_lc_from_data: bool = False,
 230    reference_df: pd.DataFrame | None = None,
 231) -> pd.DataFrame:
 232    """Estimate cumulative lactation yield per TestId using ISLC.
 233
 234    This variant applies :func:`ISLC_method` to standardized lactation records.
 235    By default it uses the package-provided standard curve, correlation matrix,
 236    and per-grid day standard deviations. Optionally, these assets can be refit
 237    from a reference dataset by providing a pandas dataframe at 'reference_df ='
 238    when ``fit_standard_lc_from_data`` is True.
 239    Alternative for customization is to set standard_lc_305,
 240    correlation_matrix, and std_per_grid_day directly.
 241
 242    Args:
 243        df: Input test-day records.
 244        days_in_milk_col: Name of DIM column in ``df``.
 245        milking_yield_col: Name of milk-yield column in ``df``.
 246        test_id_col: Name of lactation identifier column in ``df``.
 247        default_test_id: Fallback TestId value when ``test_id_col`` is absent.
 248        standard_lc_305: Standard lactation curve values used by ISLC.
 249        correlation_matrix: Correlation matrix aligned with ISLC grid.
 250        std_per_grid_day: Standard deviations aligned with ISLC grid.
 251        max_dim: Maximum DIM to include
 252            (or ``"max"`` so highest test day DIM is considered the final lactation day).
 253        fit_standard_lc_from_data: If True, fit representation from ``reference_df``.
 254        reference_df: Reference records used when fitting curve and covariance.
 255
 256    Returns:
 257        DataFrame with columns ``TestId`` and ``LactationMilkYield``.
 258
 259    Raises:
 260        ValueError: If required columns are missing or fit mode has no reference data.
 261    """
 262    # Standardize columns and filter DIM <= 305
 263    df = standardize_lactation_columns(
 264        df,
 265        days_in_milk_col=days_in_milk_col,
 266        milking_yield_col=milking_yield_col,
 267        test_id_col=test_id_col,
 268        default_test_id=default_test_id,
 269        max_dim=max_dim,
 270    )
 271
 272    # Fit covariance/standard_lc if requested
 273    if fit_standard_lc_from_data is True:
 274        if reference_df is None:
 275            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
 276        reference_df = standardize_lactation_columns(
 277            reference_df,
 278            days_in_milk_col=days_in_milk_col,
 279            milking_yield_col=milking_yield_col,
 280            test_id_col=test_id_col,
 281            default_test_id=default_test_id,
 282            max_dim=305,
 283        )
 284        dim = reference_df["DaysInMilk"]
 285        milk_yield = reference_df["MilkingYield"]
 286        standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink")
 287        # convert to pandas series
 288        standard_full_lc_305 = pd.Series(standard_full_lc_305)
 289
 290        # fit matrix
 291        (
 292            correlation_matrix,
 293            std_per_grid_day,
 294            standard_curve_fitted,
 295        ) = create_standard_lc_representation(
 296            df=reference_df,
 297            standard_lactation_curve=pd.Series(standard_full_lc_305),
 298            # reference_df is standardized above, so fixed names are required here
 299            days_in_milk_col="DaysInMilk",
 300            milking_yield_col="MilkingYield",
 301            interpolation_method=interpolation_standard_lc,
 302            small_grid=False,
 303        )
 304
 305        # Align fitted curve with default internal convention (day 0 equals day 1).
 306        standard_lc_305 = np.insert(
 307            np.asarray(standard_curve_fitted, dtype=float),
 308            0,
 309            float(standard_curve_fitted.iloc[0]),
 310        )
 311
 312    rows: list[dict[str, Any]] = []
 313    for test_id, group in df.groupby("TestId", sort=False):
 314        if group.empty:
 315            rows.append({"TestId": test_id, "LactationMilkYield": np.nan})
 316            continue
 317
 318        try:
 319            cumulative_yield = ISLC_method(
 320                df=group,
 321                standard_lc=standard_lc_305,
 322                correlation_matrix=correlation_matrix,
 323                std_per_grid_day=std_per_grid_day,
 324                days_in_milk_col="DaysInMilk",
 325                milking_yield_col="MilkingYield",
 326            )
 327        except ValueError:
 328            cumulative_yield = np.nan
 329
 330        rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield})
 331
 332    return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"]))
 333
 334
 335def ISLC_method(
 336    df: pd.DataFrame,
 337    standard_lc: pd.Series | npt.NDArray[np.float64],
 338    correlation_matrix: pd.DataFrame,
 339    std_per_grid_day: npt.NDArray[np.float64],
 340    days_in_milk_col: str = "DaysInMilk",
 341    milking_yield_col: str = "MilkingYield",
 342) -> float:
 343    """Estimate total lactation yield by predicting missing grid days, then integrating.
 344
 345    The method fills missing grid-day yields with the ICAR-style relation:
 346
 347        yp = gp + b* (yi - gi)
 348
 349        where for each missing day p, the nearest measured day i is used and
 350        b* = corr(i,p) * std(p) / std(i). The resulting complete grid profile is
 351        integrated with the test-interval style weighting used elsewhere in this
 352        package.
 353
 354    Notes:
 355        The integration is performed over the days present after merging
 356        measured and predicted points. If the input contains DIM beyond 305,
 357        total yield can reflect that extended horizon.
 358    """
 359    # add zero and 305 to the grid
 360    grid = [0] + GRID_DAYS + [305]
 361    dim_col = days_in_milk_col
 362    yield_col = milking_yield_col
 363
 364    standard_lc_series = _curve_to_series(standard_lc)
 365
 366    # Extract the measured days
 367    measured_days = df[dim_col].to_numpy()
 368    if measured_days.size == 0:
 369        raise ValueError("Input df must contain at least one measured day.")
 370
 371    # interpolate the measured days to the grid using the standard lactation curve
 372    interpolated_df = interpolation_standard_lc(
 373        group=df,
 374        days_in_milk_col=dim_col,
 375        milking_yield_col=yield_col,
 376        standard_lc=standard_lc_series,
 377    )
 378
 379    # drop unnecessary columns
 380    interpolated_df = interpolated_df.drop(columns=[dim_col, yield_col])
 381
 382    # rename grid colum
 383    interpolated_df = interpolated_df.rename(
 384        columns={"GridDay": dim_col, "MilkYieldInterp": yield_col}
 385    )[[dim_col, yield_col]]
 386
 387    measured_subset = cast(pd.DataFrame, df.loc[:, [dim_col, yield_col]])
 388    interpolated_subset = cast(pd.DataFrame, interpolated_df.loc[:, [dim_col, yield_col]])
 389
 390    # join the interpolated values with the original measurements, keeping all measurements
 391    merged_df = cast(
 392        pd.DataFrame,
 393        pd.merge(
 394            measured_subset,
 395            interpolated_subset,
 396            on=dim_col,
 397            how="outer",
 398            suffixes=("_meas", "_interp"),
 399        ),
 400    )
 401    meas_series = cast(pd.Series, merged_df[f"{yield_col}_meas"])
 402    interp_series = cast(pd.Series, merged_df[f"{yield_col}_interp"])
 403    merged_df[yield_col] = cast(pd.Series, meas_series.combine_first(interp_series))
 404    df = cast(
 405        pd.DataFrame,
 406        merged_df.loc[:, [dim_col, yield_col]].sort_values(by=dim_col),
 407    )
 408
 409    # days missing from measurement → need prediction
 410    days_to_predict = [day for day in grid if day not in df[dim_col].values]
 411
 412    predicted_rows: list[dict[str, float | int]] = []
 413
 414    for day in days_to_predict:
 415        # identify closest measured day
 416        dim_series = cast(pd.Series, interpolated_df[dim_col])
 417        grid_vals = np.asarray(dim_series.to_numpy(dtype=float), dtype=float)
 418        closest_idx = int(np.argmin(np.abs(grid_vals - float(day))))
 419        closest_day = int(dim_series.iloc[closest_idx])
 420
 421        # extract mean milk yield from the standard lactation curve for the day to predict
 422        expected_yield_grid_day = _curve_value(standard_lc_series, day)
 423        expected_yield_measured_day = _curve_value(standard_lc_series, closest_day)
 424
 425        # calculate difference between measured and expected yield at closest measured day
 426        diff = (
 427            interpolated_df.loc[interpolated_df[dim_col] == closest_day, yield_col].iloc[0]
 428            - expected_yield_measured_day
 429        )
 430
 431        # calculate the correlation-based correction factor b*
 432        # b* = corr(i,p) * std(p) / std(i)
 433        pi = list(grid).index(closest_day)
 434        qi = list(grid).index(day)
 435        bpj = correlation_matrix.iloc[pi, qi]
 436        std_q = std_per_grid_day[qi]
 437        std_p = std_per_grid_day[pi]
 438        b_star = 0.0 if std_p == 0 else float(bpj * std_q / std_p)
 439
 440        # predict the yield for the missing day
 441        predicted_yield = expected_yield_grid_day + b_star * diff
 442        predicted_rows.append({dim_col: int(day), yield_col: float(predicted_yield)})
 443
 444    if predicted_rows:
 445        df = pd.concat([df, pd.DataFrame(predicted_rows)], ignore_index=True)
 446
 447    # apply the test interval method to calculate total milk yield over 305 days
 448    # sort the dataframe by days in milk and keep one value per day
 449    df = df.drop_duplicates(subset=[dim_col], keep="last")
 450    df = df.sort_values(by=dim_col)
 451    # Trapezoidal contributions
 452    df["width"] = df[dim_col].diff().shift(-1)
 453    df["avg_yield"] = (df[yield_col] + df[yield_col].shift(-1)) / 2
 454    df["trapezoid_area"] = df["width"] * df["avg_yield"]
 455
 456    trapezoid_values = np.asarray(pd.to_numeric(df["trapezoid_area"], errors="coerce"), dtype=float)
 457    total_yield = float(np.nansum(trapezoid_values))
 458
 459    return total_yield
 460
 461
 462def ISLC_original_method(
 463    df: pd.DataFrame,
 464    standard_lc: pd.Series | npt.NDArray[np.float64],
 465    correlation_matrix: pd.DataFrame,
 466    std_per_grid_day: npt.NDArray[np.float64],
 467) -> float:
 468    """Estimate 305-day milk yield for a single interpolated lactation
 469    from the original paper by Dr. Wilmink in 1987.
 470
 471    This function computes an estimated 305-day yield by combining three
 472    components: known production from interpolated measurements, the
 473    population mean for missing grid days (from ``standard_lc``), and a
 474    correlation-based deviation correction derived from ``correlation_matrix``.
 475
 476    Args:
 477        df: DataFrame containing at least the columns ``GridDay`` and
 478            ``MilkYieldInterp`` (values already interpolated onto the grid).
 479        standard_lc: Standard lactation curve values indexed by grid day.
 480        correlation_matrix: Correlation coefficients between grid-day yields
 481            (matrix-like, rows/columns ordered as the grid).
 482        std_per_grid_day: 1-D array of standard deviations for each grid day.
 483
 484    Returns:
 485        A float with the estimated 305-day milk yield.
 486
 487    Raises:
 488        ValueError: if any measured day in ``df`` is not present in the
 489            expected global ``grid``.
 490
 491    Notes:
 492        - The routine assumes fixed recording intervals of 20 days, except
 493          that the final interval (if day 290 is present) is 25 days.
 494        - The implementation expects a global ``grid`` variable to be
 495          defined and aligned with ``standard_lc`` and
 496          ``std_per_grid_day``.
 497        - The amount of estimated milk yields depends on the
 498            availability of measured milk yields.
 499        - Lactation yield is:
 500            sum(known measurements (p) + estimated measurements (q)).
 501        - In the implementation according to the CRV E2 document,
 502            there should also be a correction factor for the difference between
 503             the expected and actual complete lactation yield of the previous lactation.
 504            However, this correction factor is not included in the current implementation
 505            as it requires additional data (previous lactation yield) that is not always available.
 506    """
 507    grid = GRID_DAYS
 508    standard_lc_series = _curve_to_series(standard_lc)
 509
 510    # --- Validate that all measurements lie on the expected grid -------
 511    measured_days = df["GridDay"].values
 512    if any(day not in grid for day in measured_days):
 513        raise ValueError("At least one DIM in the input df is not part of the grid.")
 514
 515    # --- Prepare basic components -------------------------------------
 516    df = df.sort_values("GridDay")
 517
 518    # days missing from measurement → need prediction
 519    days_to_predict = [day for day in grid if day not in measured_days]
 520    pred_idx = [list(grid).index(d) for d in days_to_predict]
 521
 522    # mean expected milk yield from standard curve
 523    mean_lc = pd.Series([_curve_value(standard_lc_series, g) for g in grid], index=grid)
 524
 525    # --- Compute known production (sum of actual measurements) ---------
 526    if 290 in measured_days:
 527        # special last interval: 25 days
 528        known_prod = df["MilkYieldInterp"][:-1].sum() * 20 + df["MilkYieldInterp"].iloc[-1] * 25
 529    else:
 530        known_prod = df["MilkYieldInterp"].sum() * 20
 531
 532    # --- Compute population mean for missing days ----------------------
 533    pop_mean_missing = mean_lc.loc[days_to_predict]
 534
 535    if 290 in pop_mean_missing.index:
 536        population_mean = pop_mean_missing.iloc[:-1].sum() * 20 + pop_mean_missing.iloc[-1] * 25
 537    else:
 538        population_mean = pop_mean_missing.sum() * 20
 539
 540    # --- Compute correlation-based correction --------------------------
 541    days_to_predict_np = np.array(days_to_predict)
 542    measured_days_np = np.array(measured_days)
 543
 544    # for each missing day: find closest measured day
 545    closest_measured = [
 546        measured_days_np[np.argmin(np.abs(measured_days_np - g))] for g in days_to_predict_np
 547    ]
 548    closest_idx = [list(grid).index(d) for d in closest_measured]
 549
 550    b_star_list = []
 551    for pi, qi in zip(closest_idx, pred_idx):
 552        bpj = correlation_matrix.iloc[pi, qi]
 553        stdj = std_per_grid_day[qi]
 554        stdp = std_per_grid_day[pi]
 555        b_star_list.append(bpj * stdj / stdp)
 556
 557    # scale correction by interval lengths
 558    if 290 in measured_days:
 559        # last interval belongs to prediction group
 560        correction_weighted = sum(b_star_list[:-1]) * 20
 561    else:
 562        correction_weighted = sum(b_star_list[:-1]) * 20 + b_star_list[-1] * 25
 563
 564    last_day = df["GridDay"].iloc[-1]
 565    last_yield = df["MilkYieldInterp"].iloc[-1]
 566    mean_last = mean_lc.loc[last_day]
 567
 568    correction = correction_weighted * (last_yield - mean_last)
 569
 570    # --- Final sum -----------------------------------------------------
 571    final_milk_yield = known_prod + population_mean + correction
 572    return float(final_milk_yield)
 573
 574
 575def ISLC_original(
 576    df: pd.DataFrame,
 577    days_in_milk_col: str | None = None,
 578    milking_yield_col: str | None = None,
 579    test_id_col: str | None = None,
 580    default_test_id: int = 0,
 581    standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE,
 582    correlation_matrix: pd.DataFrame = CORR_MATRIX,
 583    std_per_grid_day: npt.NDArray[np.float64] = STDs,
 584    fit_standard_lc_from_data: bool = False,
 585    reference_df: pd.DataFrame | None = None,
 586) -> pd.DataFrame:
 587    """Compute estimated 305-day yields using the original grid-based ISLC variant
 588    from the original paper by Dr. Wilmink in 1987.
 589
 590    The function groups standardized input by ``TestId``, interpolates to the
 591    fixed GRID_DAYS grid, and applies :func:`ISLC_original_method` per lactation.
 592
 593    Args:
 594       df: Input records with DIM and milk yield columns.
 595       days_in_milk_col: Name of DIM column in ``df``.
 596       milking_yield_col: Name of milk-yield column in ``df``.
 597       test_id_col: Name of lactation identifier column in ``df``.
 598       default_test_id: Fallback TestId when ``test_id_col`` is absent.
 599       standard_lc_305: Standard lactation curve used for interpolation.
 600       correlation_matrix: Correlation matrix over GRID_DAYS.
 601       std_per_grid_day: Standard deviations per GRID_DAYS element.
 602       fit_standard_lc_from_data: If True, refit standard curve and covariance
 603           representation from ``reference_df`` before prediction.
 604       reference_df: Reference population used when
 605           ``fit_standard_lc_from_data=True``. The reference data is
 606           standardized to ``DaysInMilk``/``MilkingYield`` and fit using
 607           ``small_grid=True`` so correlation and standard-deviation assets are
 608           aligned to ``GRID_DAYS``.
 609
 610    Returns:
 611        DataFrame with columns ``LactationMilkYield`` and ``TestId``.
 612
 613    Raises:
 614        ValueError: If required columns (DaysInMilk or MilkingYield) cannot be
 615            found, or if ``fit_standard_lc_from_data=True`` and
 616            ``reference_df`` is not provided.
 617
 618    Notes:
 619        Records with DIM > 305 are dropped before interpolation and scoring.
 620    """
 621    # Standardize columns and filter DIM <= 305
 622    df = standardize_lactation_columns(
 623        df,
 624        days_in_milk_col=days_in_milk_col,
 625        milking_yield_col=milking_yield_col,
 626        test_id_col=test_id_col,
 627        default_test_id=default_test_id,
 628        max_dim=305,
 629    )
 630
 631    # Fit covariance/standard_lc if requested
 632    if fit_standard_lc_from_data is True:
 633        if reference_df is None:
 634            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
 635        reference_df = standardize_lactation_columns(
 636            reference_df,
 637            days_in_milk_col=days_in_milk_col,
 638            milking_yield_col=milking_yield_col,
 639            test_id_col=test_id_col,
 640            default_test_id=default_test_id,
 641            max_dim=305,
 642        )
 643        dim = reference_df["DaysInMilk"]
 644        milk_yield = reference_df["MilkingYield"]
 645        standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink")
 646        standard_full_lc_305 = pd.Series(standard_full_lc_305)
 647
 648        # fit matrix
 649        (
 650            correlation_matrix,
 651            std_per_grid_day,
 652            standard_curve_fitted,
 653        ) = create_standard_lc_representation(
 654            df=reference_df,
 655            standard_lactation_curve=pd.Series(standard_full_lc_305),
 656            # reference_df is standardized above, so fixed names are required here
 657            days_in_milk_col="DaysInMilk",
 658            milking_yield_col="MilkingYield",
 659            interpolation_method=interpolation_standard_lc,
 660            small_grid=True,
 661        )
 662
 663        # Align fitted curve with default internal convention (day 0 equals day 1).
 664        standard_lc_305 = np.insert(
 665            np.asarray(standard_curve_fitted, dtype=float),
 666            0,
 667            float(standard_curve_fitted.iloc[0]),
 668        )
 669    rows: list[dict[str, Any]] = []
 670    for test_id, group in df.groupby("TestId", sort=False):
 671        lactation = interpolation_standard_lc(
 672            group,
 673            days_in_milk_col="DaysInMilk",
 674            milking_yield_col="MilkingYield",
 675            standard_lc=standard_lc_305,
 676        )
 677        if lactation.empty:
 678            rows.append({"TestId": test_id, "305_milk_yield": np.nan})
 679            continue
 680
 681        cumulative_yield = ISLC_original_method(
 682            df=lactation,
 683            standard_lc=standard_lc_305,
 684            correlation_matrix=correlation_matrix,
 685            std_per_grid_day=std_per_grid_day,
 686        )
 687        rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield})
 688
 689    return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"]))
 690
 691
 692"""
 693Create your own standard lactation curve, correlation matrix,
 694and standard deviations per grid day.
 695
 696step 1 interpolate to the days on the grid
 697Based on the CRV E2 documents, interpolation is not linear and instead
 698depends on the standard lactation curve.
 699"""
 700
 701
 702def interpolation_standard_lc(
 703    group: pd.DataFrame,
 704    days_in_milk_col: str,
 705    milking_yield_col: str,
 706    standard_lc: pd.Series | npt.NDArray[np.float64] | None = None,
 707    small_grid: bool = False,
 708) -> pd.DataFrame:
 709    """Interpolate a single lactation onto the DIM grid guided by a standard curve.
 710
 711    The function interpolates measured yields for one animal onto the
 712    predefined grid of DIM values. When a grid day coincides with a measured
 713    DIM the observed yield is used; otherwise the interpolation adjusts the
 714    linear slope between neighboring measurements by the difference in the
 715    standard curve, inspired by the CRV E2 approach.
 716
 717    The interpolation formula applied for a grid day ``gday`` between two
 718    measured points (x1,y1) and (x2,y2) is::
 719
 720        slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1)
 721        yi = gi + (slope * (gday - x1) + (y1 - g1))
 722
 723    Args:
 724        group: DataFrame with test-day records for a single animal. Must
 725            contain columns named by ``days_in_milk_col`` and
 726            ``milking_yield_col``.
 727        days_in_milk_col: Name of the DIM column in ``group``.
 728        milking_yield_col: Name of the milk-yield column in ``group``.
 729        standard_lc: A Series indexed by DIM providing the expected yield on
 730            each grid day; used to guide the interpolation shape.
 731        small_grid: If True, interpolate on GRID_DAYS only; otherwise include
 732            day 0 and day 305 in the interpolation target grid.
 733
 734    Returns:
 735        A DataFrame with one row per interpolated grid day containing the
 736        original identifying columns (from the first row of ``group``), plus
 737        the columns ``GridDay`` and ``MilkYieldInterp``.
 738
 739    Notes:
 740        - No interpolation is performed outside the range bounded by the
 741          first and last measured DIM for the lactation (those grid days are
 742          skipped).
 743                - No interpolation is performed for grid days where the right
 744                    neighboring measured DIM is >= 305, because this lies outside the
 745                    intended standard-curve support.
 746        - The implementation defines a local default ``grid`` of
 747          [10, 30, ..., 290].
 748
 749
 750    """
 751    if small_grid:
 752        grid = GRID_DAYS
 753    else:
 754        # add zero and 305 to the grid
 755        grid = [0] + GRID_DAYS + [305]
 756
 757    if standard_lc is None:
 758        raise ValueError("A standard lactation curve is required for interpolation_standard_lc.")
 759
 760    standard_lc_series = _curve_to_series(standard_lc)
 761
 762    # Sort and ensure unique DIMs
 763    group = group.sort_values(days_in_milk_col).drop_duplicates(days_in_milk_col)
 764    dims = group[days_in_milk_col].tolist()
 765
 766    rows = []
 767
 768    # loop over the grid days
 769    for gday in grid:
 770        # --- CASE 1: Exact measured day ---------------------------
 771        if gday in dims:
 772            y_val = group.loc[group[days_in_milk_col] == gday, milking_yield_col].iloc[0]
 773
 774            row = group.iloc[0].to_dict()
 775            row["GridDay"] = gday
 776            row["MilkYieldInterp"] = float(y_val)
 777            rows.append(row)
 778            continue
 779
 780        # --- CASE 2: interpolate between measured days -------------
 781        before_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] < gday])
 782        after_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] > gday])
 783        before = before_df.tail(1)
 784        after = after_df.head(1)
 785
 786        # CASE 3 No interpolation if grid day is outside the range of measured DIM
 787        if before.empty or after.empty:
 788            continue
 789
 790        # CASE 4: No interpolation if the day after is day 305 or higher,
 791        # as this is outside the range of the standard curve
 792        if float(after.iloc[0][days_in_milk_col]) >= 305:
 793            continue
 794
 795        x1 = float(before.iloc[0][days_in_milk_col])
 796        y1 = float(before.iloc[0][milking_yield_col])
 797        x2 = float(after.iloc[0][days_in_milk_col])
 798        y2 = float(after.iloc[0][milking_yield_col])
 799
 800        # expected yields from standard lactation curve
 801        g1 = _curve_value(standard_lc_series, int(x1))
 802        g2 = _curve_value(standard_lc_series, int(x2))
 803        gi = _curve_value(standard_lc_series, int(gday))
 804
 805        # Wilmink-based interpolation formula
 806        slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1)
 807        yi = gi + (slope * (gday - x1) + (y1 - g1))
 808
 809        row = group.iloc[0].to_dict()
 810        row["GridDay"] = gday
 811        row["MilkYieldInterp"] = float(yi)
 812        rows.append(row)
 813
 814    if not rows:
 815        return pd.DataFrame(
 816            columns=pd.Index([*group.columns.to_list(), "GridDay", "MilkYieldInterp"])
 817        )
 818
 819    return pd.DataFrame(rows)
 820
 821
 822def linear_interpd_all_to_grid(
 823    group: pd.DataFrame,
 824    days_in_milk_col: str,
 825    milking_yield_col: str,
 826    standard_lc: pd.Series | None = None,
 827    small_grid: bool = False,
 828) -> pd.DataFrame:
 829    """Linearly interpolate all grid days for a lactation.
 830
 831    This helper uses linear interpolation (with extrapolation) to produce
 832    milk-yield values for every grid day regardless of whether the grid day
 833    lies between measured observations.
 834
 835    Args:
 836        group: DataFrame containing measured DIM and yield values for one
 837            lactation.
 838        days_in_milk_col: Name of the DIM column.
 839        milking_yield_col: Name of the milk-yield column.
 840        small_grid: If True, interpolate on GRID_DAYS only; otherwise include
 841            day 0 and day 305.
 842
 843    Returns:
 844        A DataFrame with identifying columns copied from ``group``'s first
 845        row and columns ``GridDay`` and ``MilkYieldInterp`` containing the
 846        interpolated (or extrapolated) yields for every grid day.
 847    """
 848    if small_grid:
 849        grid = GRID_DAYS
 850    else:
 851        # add zero and 305 to the grid
 852        grid = [0] + GRID_DAYS + [305]
 853
 854    group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col)
 855    f = interp1d(
 856        group[days_in_milk_col],
 857        group[milking_yield_col],
 858        kind="linear",
 859        fill_value=cast(Any, "extrapolate"),
 860    )
 861
 862    base = {
 863        col: group[col].iloc[0]
 864        for col in group.columns
 865        if col not in [days_in_milk_col, milking_yield_col]
 866    }
 867    base["GridDay"] = grid
 868    base["MilkYieldInterp"] = f(grid)
 869
 870    return pd.DataFrame(base)
 871
 872
 873# Create an interpolation function that only outputs grid days between two
 874# milk measurements using linear interpolation.
 875def linear_interpd_closest_to_grid(
 876    group: pd.DataFrame,
 877    days_in_milk_col: str,
 878    milking_yield_col: str,
 879    standard_lc: pd.Series | None = None,
 880    small_grid: bool = False,
 881) -> pd.DataFrame | None:
 882    """Linearly interpolate grid days between measured observations.
 883
 884    This helper returns interpolated yields only for grid days that lie
 885    between the first and last measured DIM for the lactation. If no grid
 886    days fall within the measured range the function returns ``None``.
 887
 888    Args:
 889        group: DataFrame for a single lactation.
 890        days_in_milk_col: Name of the DIM column.
 891        milking_yield_col: Name of the milk-yield column.
 892        small_grid: If True, use GRID_DAYS only; otherwise include
 893            day 0 and day 305.
 894
 895    Returns:
 896        A DataFrame with identifying columns plus ``GridDay`` and
 897        ``MilkYieldInterp``, or ``None`` if there are no grid days within
 898        the measured range.
 899    """
 900    if small_grid:
 901        grid = GRID_DAYS
 902    else:
 903        # add zero and 305 to the grid
 904        grid = [0] + GRID_DAYS + [305]
 905
 906    group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col)
 907
 908    # Find the range of interpolatable DIM
 909    min_dim = group[days_in_milk_col].min()
 910    max_dim = group[days_in_milk_col].max()
 911    grid_in_range = [g for g in grid if min_dim <= g <= max_dim]
 912
 913    # apply linear interpolation
 914    if not grid_in_range:
 915        return None
 916    f = interp1d(
 917        group[days_in_milk_col],
 918        group[milking_yield_col],
 919        kind="linear",
 920        fill_value=cast(Any, np.nan),
 921    )
 922
 923    # create a new dataframe with the newly created columns
 924    base = {
 925        col: group[col].iloc[0]
 926        for col in group.columns
 927        if col not in [days_in_milk_col, milking_yield_col]
 928    }
 929    base["GridDay"] = grid_in_range
 930    base["MilkYieldInterp"] = f(grid_in_range)
 931
 932    return pd.DataFrame(base)
 933
 934
 935def create_standard_lc_representation(
 936    df: pd.DataFrame,
 937    standard_lactation_curve: pd.Series,
 938    days_in_milk_col: str,
 939    milking_yield_col: str,
 940    interpolation_method: InterpolationMethod = interpolation_standard_lc,
 941    lc_model: str = "Wilmink",
 942    small_grid: bool = False,
 943) -> tuple[pd.DataFrame, npt.NDArray[np.float64], pd.Series]:
 944    """Create a standard lactation curve with correlation matrix from data.
 945
 946    This function estimates the correlation structure and standard deviations
 947    of milk yields across grid days, and refits a Wilmink lactation curve model to
 948    the interpolated data. The Wilmink model is hereby the default,
 949    but any model from the LC package can be used.
 950    The outputs can be used as inputs to the
 951    ``ISLC_method`` or ``ISLC_original`` functions for predicting 305-d yields.
 952
 953    The process standardizes yields per animal and computes row-wise
 954    (per-animal) statistics, then derives relationships across grid days.
 955
 956    Args:
 957        df: Input DataFrame containing test-day records. Must include a
 958            ``TestId`` column to identify unique lactations. Also requires
 959            columns specified by ``days_in_milk_col`` and ``milking_yield_col``.
 960        standard_lactation_curve: A Series providing a reference lactation
 961            curve (e.g., the fitted Wilmink curve on which to base
 962            interpolation).
 963        days_in_milk_col: Name of the column in ``df`` containing days in milk (DIM).
 964        milking_yield_col: Name of the column in ``df`` containing measured
 965            milk yields.
 966        interpolation_method: An interpolation method conforming to
 967            ``InterpolationMethod`` protocol (default: ``interpolation_standard_lc``).
 968            Must be callable with signature
 969            (group, days_in_milk_col, milking_yield_col, standard_lc).
 970        lc_model: The lactation curve model to fit to the interpolated data. Default is "Wilmink".
 971        small_grid: If True, build representation on GRID_DAYS only.
 972
 973    Returns:
 974        A tuple of three elements:
 975        - corr (pd.DataFrame): Correlation matrix between grid-day yields.
 976        - std_per_grid_day (np.ndarray): Standard deviations of standardized
 977          yields per grid day.
 978        - standard_lactation_curve_grid (pd.Series): Refitted Wilmink model
 979          indexed by DIM (1–305).
 980
 981    Notes:
 982        - Expects ``df`` to have a ``TestId`` column to group lactations.
 983        - Performs standardization per animal (row-wise), so output
 984          correlations and standard deviations reflect between-day variation
 985          within standardized animal profiles.
 986        - Uses ``fit_lactation_curve`` from the ``lactationcurve`` package
 987          with model='Wilmink' and fitting='frequentist'.
 988          However other models can be used by specifying the ``lc_model`` argument.
 989    """
 990
 991    if "TestId" not in df.columns:
 992        raise ValueError("DataFrame must contain a 'TestId' column.")
 993    if days_in_milk_col not in df.columns or milking_yield_col not in df.columns:
 994        raise ValueError(f"Columns '{days_in_milk_col}' and '{milking_yield_col}' must be in df.")
 995
 996    interpolated_groups: list[pd.DataFrame] = []
 997    for test_id, group in df.groupby("TestId", sort=False):
 998        try:
 999            interp_group = interpolation_method(
1000                group,
1001                days_in_milk_col=days_in_milk_col,
1002                milking_yield_col=milking_yield_col,
1003                standard_lc=standard_lactation_curve,
1004                small_grid=small_grid,
1005            )
1006        except TypeError:
1007            # Backward compatibility for custom interpolation callables
1008            # that do not yet accept small_grid.
1009            interp_group = interpolation_method(
1010                group,
1011                days_in_milk_col=days_in_milk_col,
1012                milking_yield_col=milking_yield_col,
1013                standard_lc=standard_lactation_curve,
1014            )
1015        if interp_group.empty:
1016            continue
1017        if "TestId" not in interp_group.columns:
1018            interp_group = interp_group.copy()
1019            interp_group["TestId"] = test_id
1020        interpolated_groups.append(interp_group)
1021
1022    if not interpolated_groups:
1023        raise ValueError("No interpolated rows were produced; cannot build representation.")
1024
1025    df_grid = pd.concat(interpolated_groups, ignore_index=True)
1026
1027    # define Znp
1028    # pivot df to make GridDay the columns
1029    # Z should be the unscaled pivot table (TestId × GridDay)
1030    Z = df_grid.pivot(index="TestId", columns="GridDay", values="MilkYieldInterp")
1031
1032    # Standardize records per cow (so each row has mean=0, std=1)
1033    Z_std = (Z.sub(Z.mean(axis=1), axis=0)).div(Z.std(axis=1), axis=0)
1034
1035    # Convert to NumPy array
1036    Znp = Z_std.to_numpy()
1037
1038    # calculate correlation matrix
1039    corr = pd.DataFrame(Znp, columns=Z.columns).corr()
1040
1041    # calculate standard deviations for each grid day from the Znp matrix
1042    std_per_grid_day = np.nanstd(Znp, axis=0)  # ignores NaN
1043
1044    # fit Wilmink model to get mean values and std for each cow
1045    standard_lactation_curve_grid = pd.Series(
1046        fit_lactation_curve(
1047            df_grid["GridDay"].values,
1048            df_grid["MilkYieldInterp"].values,
1049            model=lc_model,
1050            fitting="frequentist",
1051        ),
1052        index=range(1, 306),
1053    )
1054    return corr, std_per_grid_day, standard_lactation_curve_grid
1055
1056
1057def _set_doc_signatures() -> None:
1058    """Override displayed defaults in docs without changing runtime behavior."""
1059    doc_defaults = {
1060        "standard_lc_305": _DOC_STANDARD_CURVE,
1061        "correlation_matrix": _DOC_CORR_MATRIX,
1062        "std_per_grid_day": _DOC_STDS,
1063    }
1064
1065    for func in (ISLC, ISLC_original):
1066        signature = inspect.signature(func)
1067        params = [
1068            param.replace(default=doc_defaults[param.name]) if param.name in doc_defaults else param
1069            for param in signature.parameters.values()
1070        ]
1071        func.__signature__ = signature.replace(parameters=params)
1072
1073
1074_set_doc_signatures()
DATA_DIR = PosixPath('/home/runner/work/bovi/bovi/packages/models/lactationcurve/src/lactationcurve/characteristics/data')
GRID_DAYS = [10, 30, 50, 70, 90, 110, 130, 150, 170, 190, 210, 230, 250, 270, 290]
CORR_MATRIX = GridDay 0.0 10.0 30.0 ... 270.0 290.0 305.0 GridDay ... 0.0 1.000000 0.965949 -0.033732 ... -0.025202 0.000000 0.000000 10.0 0.965949 1.000000 0.638007 ... -0.375711 -0.274761 0.259384 30.0 -0.033732 0.638007 1.000000 ... -0.138547 -0.046769 0.187952 50.0 -0.624817 0.299238 0.592408 ... -0.021251 0.050508 -0.041876 70.0 -0.569148 0.077973 0.104493 ... 0.016113 0.072450 -0.168672 90.0 -0.395047 -0.065146 -0.146547 ... -0.074053 -0.016841 -0.085893 110.0 -0.180308 -0.176841 -0.249618 ... -0.148370 -0.153977 -0.069782 130.0 -0.302078 -0.286235 -0.287885 ... -0.169098 -0.197214 0.074491 150.0 -0.802694 -0.349035 -0.308913 ... -0.201477 -0.273330 -0.185234 170.0 -0.637522 -0.425677 -0.312774 ... -0.210678 -0.328788 -0.371272 190.0 -0.476235 -0.477123 -0.320066 ... -0.163438 -0.288798 -0.384518 210.0 -0.295459 -0.519669 -0.328472 ... -0.095577 -0.293554 -0.352892 230.0 -0.326996 -0.528388 -0.282649 ... 0.106181 -0.164447 -0.099359 250.0 -0.192704 -0.452242 -0.214573 ... 0.616778 0.130349 -0.161357 270.0 -0.025202 -0.375711 -0.138547 ... 1.000000 0.567471 -0.097275 290.0 0.000000 -0.274761 -0.046769 ... 0.567471 1.000000 0.725560 305.0 0.000000 0.259384 0.187952 ... -0.097275 0.725560 1.000000 [17 rows x 17 columns]
STDs = array([1.32074343, 1.03174213, 0.75046839, 0.55671493, 0.46854951, 0.4755677 , 0.48899326, 0.49154118, 0.45175834, 0.43489799, 0.4336659 , 0.4205729 , 0.42673442, 0.4169191 , 0.46659971, 0.50185677, 0.4592163 ])
STANDARD_CURVE = array([26.20693316, 26.20693316, 26.95125343, 27.65573222, 28.32231262, 28.95284294, 29.54908138, 30.11270037, 30.64529078, 31.14836589, 31.62336517, 32.07165791, 32.4945466 , 32.8932702 , 33.26900726, 33.62287885, 33.95595136, 34.26923918, 34.56370721, 34.8402733 , 35.09981055, 35.34314945, 35.57108002, 35.78435373, 35.9836854 , 36.16975498, 36.34320929, 36.50466357, 36.65470307, 36.79388451, 36.92273742, 37.04176555, 37.15144804, 37.2522407 , 37.34457708, 37.42886961, 37.50551058, 37.57487317, 37.63731235, 37.69316577, 37.74275463, 37.78638445, 37.82434587, 37.85691532, 37.88435578, 37.9069174 , 37.92483812, 37.93834427, 37.94765116, 37.95296359, 37.95447637, 37.95237482, 37.94683519, 37.93802518, 37.92610428, 37.91122421, 37.89352929, 37.87315681, 37.85023734, 37.82489511, 37.79724827, 37.76740923, 37.7354849 , 37.70157697, 37.6657822 , 37.62819259, 37.5888957 , 37.54797478, 37.50550904, 37.46157381, 37.41624078, 37.3695781 , 37.32165063, 37.27252005, 37.22224504, 37.17088141, 37.11848225, 37.06509808, 37.01077692, 36.95556448, 36.89950422, 36.84263749, 36.78500362, 36.72664004, 36.66758232, 36.60786432, 36.54751824, 36.48657472, 36.42506289, 36.36301047, 36.30044382, 36.23738802, 36.17386693, 36.10990324, 36.04551854, 35.98073336, 35.91556723, 35.85003874, 35.78416554, 35.71796446, 35.65145149, 35.58464183, 35.51754996, 35.45018964, 35.38257396, 35.31471538, 35.24662574, 35.17831631, 35.10979782, 35.04108045, 34.97217391, 34.90308742, 34.83382976, 34.76440928, 34.69483391, 34.62511122, 34.55524838, 34.48525224, 34.41512928, 34.34488571, 34.2745274 , 34.20405994, 34.13348866, 34.06281863, 33.99205465, 33.92120132, 33.85026298, 33.77924379, 33.70814769, 33.63697842, 33.56573957, 33.49443451, 33.42306648, 33.35163856, 33.28015365, 33.20861455, 33.13702389, 33.06538419, 32.99369784, 32.92196712, 32.85019418, 32.7783811 , 32.70652982, 32.63464221, 32.56272004, 32.490765 , 32.41877868, 32.34676263, 32.27471827, 32.20264701, 32.13055014, 32.05842892, 31.98628453, 31.91411811, 31.84193072, 31.7697234 , 31.69749711, 31.62525278, 31.55299129, 31.48071348, 31.40842013, 31.33611202, 31.26378985, 31.19145432, 31.11910607, 31.04674574, 30.97437389, 30.90199111, 30.82959792, 30.75719482, 30.68478231, 30.61236084, 30.53993084, 30.46749274, 30.39504693, 30.32259378, 30.25013366, 30.1776669 , 30.10519382, 30.03271474, 29.96022995, 29.88773972, 29.81524433, 29.74274402, 29.67023903, 29.59772959, 29.52521592, 29.45269822, 29.3801767 , 29.30765153, 29.2351229 , 29.16259097, 29.09005591, 29.01751786, 28.94497698, 28.8724334 , 28.79988725, 28.72733866, 28.65478775, 28.58223463, 28.50967941, 28.43712219, 28.36456306, 28.29200213, 28.21943948, 28.14687519, 28.07430934, 28.00174202, 27.92917328, 27.85660321, 27.78403186, 27.71145929, 27.63888558, 27.56631076, 27.49373491, 27.42115806, 27.34858026, 27.27600157, 27.20342203, 27.13084167, 27.05826054, 26.98567867, 26.9130961 , 26.84051287, 26.767929 , 26.69534454, 26.6227595 , 26.55017391, 26.47758781, 26.40500121, 26.33241415, 26.25982664, 26.1872387 , 26.11465036, 26.04206164, 25.96947255, 25.89688311, 25.82429335, 25.75170327, 25.67911289, 25.60652222, 25.53393129, 25.4613401 , 25.38874866, 25.31615699, 25.2435651 , 25.170973 , 25.0983807 , 25.02578821, 24.95319553, 24.88060269, 24.80800968, 24.73541651, 24.66282319, 24.59022974, 24.51763615, 24.44504243, 24.37244859, 24.29985463, 24.22726057, 24.1546664 , 24.08207213, 24.00947776, 23.93688331, 23.86428877, 23.79169415, 23.71909945, 23.64650468, 23.57390983, 23.50131492, 23.42871995, 23.35612492, 23.28352983, 23.21093468, 23.13833948, 23.06574424, 22.99314894, 22.9205536 , 22.84795822, 22.7753628 , 22.70276734, 22.63017184, 22.55757631, 22.48498075, 22.41238515, 22.33978952, 22.26719387, 22.19459819, 22.12200248, 22.04940675, 21.97681099, 21.90421521, 21.83161942, 21.7590236 , 21.68642776, 21.6138319 , 21.54123603, 21.46864014, 21.39604423, 21.32344831, 21.25085238, 21.17825643, 21.10566047, 21.0330645 , 20.96046851, 20.88787252])
class InterpolationMethod(typing.Protocol):
187class InterpolationMethod(Protocol):
188    """Protocol for milk-yield interpolation methods.
189
190    Defines the signature expected by functions that interpolate test-day
191    yields onto a predefined DIM grid.
192    """
193
194    def __call__(
195        self,
196        group: pd.DataFrame,
197        days_in_milk_col: str,
198        milking_yield_col: str,
199        standard_lc: pd.Series | None = None,
200        small_grid: bool = False,
201    ) -> pd.DataFrame:
202        """Interpolate yields for a single lactation onto the DIM grid.
203
204        Args:
205            group: Test-day records for a single animal.
206            days_in_milk_col: Column name containing days in milk (DIM).
207            milking_yield_col: Column name containing milk yields.
208            standard_lc: Optional standard lactation curve to guide
209                interpolation (method-dependent).
210            small_grid: If True, interpolate only on GRID_DAYS;
211                otherwise use [0] + GRID_DAYS + [305].
212
213        Returns:
214            A DataFrame with interpolated yields, typically including
215            columns ``GridDay`` and ``MilkYieldInterp``.
216        """
217        ...

Protocol for milk-yield interpolation methods.

Defines the signature expected by functions that interpolate test-day yields onto a predefined DIM grid.

InterpolationMethod(*args, **kwargs)
1771def _no_init_or_replace_init(self, *args, **kwargs):
1772    cls = type(self)
1773
1774    if cls._is_protocol:
1775        raise TypeError('Protocols cannot be instantiated')
1776
1777    # Already using a custom `__init__`. No need to calculate correct
1778    # `__init__` to call. This can lead to RecursionError. See bpo-45121.
1779    if cls.__init__ is not _no_init_or_replace_init:
1780        return
1781
1782    # Initially, `__init__` of a protocol subclass is set to `_no_init_or_replace_init`.
1783    # The first instantiation of the subclass will call `_no_init_or_replace_init` which
1784    # searches for a proper new `__init__` in the MRO. The new `__init__`
1785    # replaces the subclass' old `__init__` (ie `_no_init_or_replace_init`). Subsequent
1786    # instantiation of the protocol subclass will thus use the new
1787    # `__init__` and no longer call `_no_init_or_replace_init`.
1788    for base in cls.__mro__:
1789        init = base.__dict__.get('__init__', _no_init_or_replace_init)
1790        if init is not _no_init_or_replace_init:
1791            cls.__init__ = init
1792            break
1793    else:
1794        # should not happen
1795        cls.__init__ = object.__init__
1796
1797    cls.__init__(self, *args, **kwargs)
def ISLC( df: pandas.core.frame.DataFrame, days_in_milk_col: str | None = None, milking_yield_col: str | None = None, test_id_col: str | None = None, default_test_id: int = 0, standard_lc_305: pandas.core.series.Series | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]] = STANDARD_CURVE, correlation_matrix: pandas.core.frame.DataFrame = CORR_MATRIX, std_per_grid_day: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]] = STDs, max_dim: int | str = 305, fit_standard_lc_from_data: bool = False, reference_df: pandas.core.frame.DataFrame | None = None) -> pandas.core.frame.DataFrame:
220def ISLC(
221    df: pd.DataFrame,
222    days_in_milk_col: str | None = None,
223    milking_yield_col: str | None = None,
224    test_id_col: str | None = None,
225    default_test_id: int = 0,
226    standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE,
227    correlation_matrix: pd.DataFrame = CORR_MATRIX,
228    std_per_grid_day: npt.NDArray[np.float64] = STDs,
229    max_dim: int | str = 305,
230    fit_standard_lc_from_data: bool = False,
231    reference_df: pd.DataFrame | None = None,
232) -> pd.DataFrame:
233    """Estimate cumulative lactation yield per TestId using ISLC.
234
235    This variant applies :func:`ISLC_method` to standardized lactation records.
236    By default it uses the package-provided standard curve, correlation matrix,
237    and per-grid day standard deviations. Optionally, these assets can be refit
238    from a reference dataset by providing a pandas dataframe at 'reference_df ='
239    when ``fit_standard_lc_from_data`` is True.
240    Alternative for customization is to set standard_lc_305,
241    correlation_matrix, and std_per_grid_day directly.
242
243    Args:
244        df: Input test-day records.
245        days_in_milk_col: Name of DIM column in ``df``.
246        milking_yield_col: Name of milk-yield column in ``df``.
247        test_id_col: Name of lactation identifier column in ``df``.
248        default_test_id: Fallback TestId value when ``test_id_col`` is absent.
249        standard_lc_305: Standard lactation curve values used by ISLC.
250        correlation_matrix: Correlation matrix aligned with ISLC grid.
251        std_per_grid_day: Standard deviations aligned with ISLC grid.
252        max_dim: Maximum DIM to include
253            (or ``"max"`` so highest test day DIM is considered the final lactation day).
254        fit_standard_lc_from_data: If True, fit representation from ``reference_df``.
255        reference_df: Reference records used when fitting curve and covariance.
256
257    Returns:
258        DataFrame with columns ``TestId`` and ``LactationMilkYield``.
259
260    Raises:
261        ValueError: If required columns are missing or fit mode has no reference data.
262    """
263    # Standardize columns and filter DIM <= 305
264    df = standardize_lactation_columns(
265        df,
266        days_in_milk_col=days_in_milk_col,
267        milking_yield_col=milking_yield_col,
268        test_id_col=test_id_col,
269        default_test_id=default_test_id,
270        max_dim=max_dim,
271    )
272
273    # Fit covariance/standard_lc if requested
274    if fit_standard_lc_from_data is True:
275        if reference_df is None:
276            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
277        reference_df = standardize_lactation_columns(
278            reference_df,
279            days_in_milk_col=days_in_milk_col,
280            milking_yield_col=milking_yield_col,
281            test_id_col=test_id_col,
282            default_test_id=default_test_id,
283            max_dim=305,
284        )
285        dim = reference_df["DaysInMilk"]
286        milk_yield = reference_df["MilkingYield"]
287        standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink")
288        # convert to pandas series
289        standard_full_lc_305 = pd.Series(standard_full_lc_305)
290
291        # fit matrix
292        (
293            correlation_matrix,
294            std_per_grid_day,
295            standard_curve_fitted,
296        ) = create_standard_lc_representation(
297            df=reference_df,
298            standard_lactation_curve=pd.Series(standard_full_lc_305),
299            # reference_df is standardized above, so fixed names are required here
300            days_in_milk_col="DaysInMilk",
301            milking_yield_col="MilkingYield",
302            interpolation_method=interpolation_standard_lc,
303            small_grid=False,
304        )
305
306        # Align fitted curve with default internal convention (day 0 equals day 1).
307        standard_lc_305 = np.insert(
308            np.asarray(standard_curve_fitted, dtype=float),
309            0,
310            float(standard_curve_fitted.iloc[0]),
311        )
312
313    rows: list[dict[str, Any]] = []
314    for test_id, group in df.groupby("TestId", sort=False):
315        if group.empty:
316            rows.append({"TestId": test_id, "LactationMilkYield": np.nan})
317            continue
318
319        try:
320            cumulative_yield = ISLC_method(
321                df=group,
322                standard_lc=standard_lc_305,
323                correlation_matrix=correlation_matrix,
324                std_per_grid_day=std_per_grid_day,
325                days_in_milk_col="DaysInMilk",
326                milking_yield_col="MilkingYield",
327            )
328        except ValueError:
329            cumulative_yield = np.nan
330
331        rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield})
332
333    return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"]))

Estimate cumulative lactation yield per TestId using ISLC.

This variant applies ISLC_method() to standardized lactation records. By default it uses the package-provided standard curve, correlation matrix, and per-grid day standard deviations. Optionally, these assets can be refit from a reference dataset by providing a pandas dataframe at 'reference_df =' when fit_standard_lc_from_data is True. Alternative for customization is to set standard_lc_305, correlation_matrix, and std_per_grid_day directly.

Arguments:
  • df: Input test-day records.
  • days_in_milk_col: Name of DIM column in df.
  • milking_yield_col: Name of milk-yield column in df.
  • test_id_col: Name of lactation identifier column in df.
  • default_test_id: Fallback TestId value when test_id_col is absent.
  • standard_lc_305: Standard lactation curve values used by ISLC.
  • correlation_matrix: Correlation matrix aligned with ISLC grid.
  • std_per_grid_day: Standard deviations aligned with ISLC grid.
  • max_dim: Maximum DIM to include (or "max" so highest test day DIM is considered the final lactation day).
  • fit_standard_lc_from_data: If True, fit representation from reference_df.
  • reference_df: Reference records used when fitting curve and covariance.
Returns:

DataFrame with columns TestId and LactationMilkYield.

Raises:
  • ValueError: If required columns are missing or fit mode has no reference data.
def ISLC_method( df: pandas.core.frame.DataFrame, standard_lc: pandas.core.series.Series | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], correlation_matrix: pandas.core.frame.DataFrame, std_per_grid_day: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], days_in_milk_col: str = 'DaysInMilk', milking_yield_col: str = 'MilkingYield') -> float:
336def ISLC_method(
337    df: pd.DataFrame,
338    standard_lc: pd.Series | npt.NDArray[np.float64],
339    correlation_matrix: pd.DataFrame,
340    std_per_grid_day: npt.NDArray[np.float64],
341    days_in_milk_col: str = "DaysInMilk",
342    milking_yield_col: str = "MilkingYield",
343) -> float:
344    """Estimate total lactation yield by predicting missing grid days, then integrating.
345
346    The method fills missing grid-day yields with the ICAR-style relation:
347
348        yp = gp + b* (yi - gi)
349
350        where for each missing day p, the nearest measured day i is used and
351        b* = corr(i,p) * std(p) / std(i). The resulting complete grid profile is
352        integrated with the test-interval style weighting used elsewhere in this
353        package.
354
355    Notes:
356        The integration is performed over the days present after merging
357        measured and predicted points. If the input contains DIM beyond 305,
358        total yield can reflect that extended horizon.
359    """
360    # add zero and 305 to the grid
361    grid = [0] + GRID_DAYS + [305]
362    dim_col = days_in_milk_col
363    yield_col = milking_yield_col
364
365    standard_lc_series = _curve_to_series(standard_lc)
366
367    # Extract the measured days
368    measured_days = df[dim_col].to_numpy()
369    if measured_days.size == 0:
370        raise ValueError("Input df must contain at least one measured day.")
371
372    # interpolate the measured days to the grid using the standard lactation curve
373    interpolated_df = interpolation_standard_lc(
374        group=df,
375        days_in_milk_col=dim_col,
376        milking_yield_col=yield_col,
377        standard_lc=standard_lc_series,
378    )
379
380    # drop unnecessary columns
381    interpolated_df = interpolated_df.drop(columns=[dim_col, yield_col])
382
383    # rename grid colum
384    interpolated_df = interpolated_df.rename(
385        columns={"GridDay": dim_col, "MilkYieldInterp": yield_col}
386    )[[dim_col, yield_col]]
387
388    measured_subset = cast(pd.DataFrame, df.loc[:, [dim_col, yield_col]])
389    interpolated_subset = cast(pd.DataFrame, interpolated_df.loc[:, [dim_col, yield_col]])
390
391    # join the interpolated values with the original measurements, keeping all measurements
392    merged_df = cast(
393        pd.DataFrame,
394        pd.merge(
395            measured_subset,
396            interpolated_subset,
397            on=dim_col,
398            how="outer",
399            suffixes=("_meas", "_interp"),
400        ),
401    )
402    meas_series = cast(pd.Series, merged_df[f"{yield_col}_meas"])
403    interp_series = cast(pd.Series, merged_df[f"{yield_col}_interp"])
404    merged_df[yield_col] = cast(pd.Series, meas_series.combine_first(interp_series))
405    df = cast(
406        pd.DataFrame,
407        merged_df.loc[:, [dim_col, yield_col]].sort_values(by=dim_col),
408    )
409
410    # days missing from measurement → need prediction
411    days_to_predict = [day for day in grid if day not in df[dim_col].values]
412
413    predicted_rows: list[dict[str, float | int]] = []
414
415    for day in days_to_predict:
416        # identify closest measured day
417        dim_series = cast(pd.Series, interpolated_df[dim_col])
418        grid_vals = np.asarray(dim_series.to_numpy(dtype=float), dtype=float)
419        closest_idx = int(np.argmin(np.abs(grid_vals - float(day))))
420        closest_day = int(dim_series.iloc[closest_idx])
421
422        # extract mean milk yield from the standard lactation curve for the day to predict
423        expected_yield_grid_day = _curve_value(standard_lc_series, day)
424        expected_yield_measured_day = _curve_value(standard_lc_series, closest_day)
425
426        # calculate difference between measured and expected yield at closest measured day
427        diff = (
428            interpolated_df.loc[interpolated_df[dim_col] == closest_day, yield_col].iloc[0]
429            - expected_yield_measured_day
430        )
431
432        # calculate the correlation-based correction factor b*
433        # b* = corr(i,p) * std(p) / std(i)
434        pi = list(grid).index(closest_day)
435        qi = list(grid).index(day)
436        bpj = correlation_matrix.iloc[pi, qi]
437        std_q = std_per_grid_day[qi]
438        std_p = std_per_grid_day[pi]
439        b_star = 0.0 if std_p == 0 else float(bpj * std_q / std_p)
440
441        # predict the yield for the missing day
442        predicted_yield = expected_yield_grid_day + b_star * diff
443        predicted_rows.append({dim_col: int(day), yield_col: float(predicted_yield)})
444
445    if predicted_rows:
446        df = pd.concat([df, pd.DataFrame(predicted_rows)], ignore_index=True)
447
448    # apply the test interval method to calculate total milk yield over 305 days
449    # sort the dataframe by days in milk and keep one value per day
450    df = df.drop_duplicates(subset=[dim_col], keep="last")
451    df = df.sort_values(by=dim_col)
452    # Trapezoidal contributions
453    df["width"] = df[dim_col].diff().shift(-1)
454    df["avg_yield"] = (df[yield_col] + df[yield_col].shift(-1)) / 2
455    df["trapezoid_area"] = df["width"] * df["avg_yield"]
456
457    trapezoid_values = np.asarray(pd.to_numeric(df["trapezoid_area"], errors="coerce"), dtype=float)
458    total_yield = float(np.nansum(trapezoid_values))
459
460    return total_yield

Estimate total lactation yield by predicting missing grid days, then integrating.

The method fills missing grid-day yields with the ICAR-style relation:

yp = gp + b* (yi - gi)

where for each missing day p, the nearest measured day i is used and
b* = corr(i,p) * std(p) / std(i). The resulting complete grid profile is
integrated with the test-interval style weighting used elsewhere in this
package.
Notes:

The integration is performed over the days present after merging measured and predicted points. If the input contains DIM beyond 305, total yield can reflect that extended horizon.

def ISLC_original_method( df: pandas.core.frame.DataFrame, standard_lc: pandas.core.series.Series | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], correlation_matrix: pandas.core.frame.DataFrame, std_per_grid_day: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]]) -> float:
463def ISLC_original_method(
464    df: pd.DataFrame,
465    standard_lc: pd.Series | npt.NDArray[np.float64],
466    correlation_matrix: pd.DataFrame,
467    std_per_grid_day: npt.NDArray[np.float64],
468) -> float:
469    """Estimate 305-day milk yield for a single interpolated lactation
470    from the original paper by Dr. Wilmink in 1987.
471
472    This function computes an estimated 305-day yield by combining three
473    components: known production from interpolated measurements, the
474    population mean for missing grid days (from ``standard_lc``), and a
475    correlation-based deviation correction derived from ``correlation_matrix``.
476
477    Args:
478        df: DataFrame containing at least the columns ``GridDay`` and
479            ``MilkYieldInterp`` (values already interpolated onto the grid).
480        standard_lc: Standard lactation curve values indexed by grid day.
481        correlation_matrix: Correlation coefficients between grid-day yields
482            (matrix-like, rows/columns ordered as the grid).
483        std_per_grid_day: 1-D array of standard deviations for each grid day.
484
485    Returns:
486        A float with the estimated 305-day milk yield.
487
488    Raises:
489        ValueError: if any measured day in ``df`` is not present in the
490            expected global ``grid``.
491
492    Notes:
493        - The routine assumes fixed recording intervals of 20 days, except
494          that the final interval (if day 290 is present) is 25 days.
495        - The implementation expects a global ``grid`` variable to be
496          defined and aligned with ``standard_lc`` and
497          ``std_per_grid_day``.
498        - The amount of estimated milk yields depends on the
499            availability of measured milk yields.
500        - Lactation yield is:
501            sum(known measurements (p) + estimated measurements (q)).
502        - In the implementation according to the CRV E2 document,
503            there should also be a correction factor for the difference between
504             the expected and actual complete lactation yield of the previous lactation.
505            However, this correction factor is not included in the current implementation
506            as it requires additional data (previous lactation yield) that is not always available.
507    """
508    grid = GRID_DAYS
509    standard_lc_series = _curve_to_series(standard_lc)
510
511    # --- Validate that all measurements lie on the expected grid -------
512    measured_days = df["GridDay"].values
513    if any(day not in grid for day in measured_days):
514        raise ValueError("At least one DIM in the input df is not part of the grid.")
515
516    # --- Prepare basic components -------------------------------------
517    df = df.sort_values("GridDay")
518
519    # days missing from measurement → need prediction
520    days_to_predict = [day for day in grid if day not in measured_days]
521    pred_idx = [list(grid).index(d) for d in days_to_predict]
522
523    # mean expected milk yield from standard curve
524    mean_lc = pd.Series([_curve_value(standard_lc_series, g) for g in grid], index=grid)
525
526    # --- Compute known production (sum of actual measurements) ---------
527    if 290 in measured_days:
528        # special last interval: 25 days
529        known_prod = df["MilkYieldInterp"][:-1].sum() * 20 + df["MilkYieldInterp"].iloc[-1] * 25
530    else:
531        known_prod = df["MilkYieldInterp"].sum() * 20
532
533    # --- Compute population mean for missing days ----------------------
534    pop_mean_missing = mean_lc.loc[days_to_predict]
535
536    if 290 in pop_mean_missing.index:
537        population_mean = pop_mean_missing.iloc[:-1].sum() * 20 + pop_mean_missing.iloc[-1] * 25
538    else:
539        population_mean = pop_mean_missing.sum() * 20
540
541    # --- Compute correlation-based correction --------------------------
542    days_to_predict_np = np.array(days_to_predict)
543    measured_days_np = np.array(measured_days)
544
545    # for each missing day: find closest measured day
546    closest_measured = [
547        measured_days_np[np.argmin(np.abs(measured_days_np - g))] for g in days_to_predict_np
548    ]
549    closest_idx = [list(grid).index(d) for d in closest_measured]
550
551    b_star_list = []
552    for pi, qi in zip(closest_idx, pred_idx):
553        bpj = correlation_matrix.iloc[pi, qi]
554        stdj = std_per_grid_day[qi]
555        stdp = std_per_grid_day[pi]
556        b_star_list.append(bpj * stdj / stdp)
557
558    # scale correction by interval lengths
559    if 290 in measured_days:
560        # last interval belongs to prediction group
561        correction_weighted = sum(b_star_list[:-1]) * 20
562    else:
563        correction_weighted = sum(b_star_list[:-1]) * 20 + b_star_list[-1] * 25
564
565    last_day = df["GridDay"].iloc[-1]
566    last_yield = df["MilkYieldInterp"].iloc[-1]
567    mean_last = mean_lc.loc[last_day]
568
569    correction = correction_weighted * (last_yield - mean_last)
570
571    # --- Final sum -----------------------------------------------------
572    final_milk_yield = known_prod + population_mean + correction
573    return float(final_milk_yield)

Estimate 305-day milk yield for a single interpolated lactation from the original paper by Dr. Wilmink in 1987.

This function computes an estimated 305-day yield by combining three components: known production from interpolated measurements, the population mean for missing grid days (from standard_lc), and a correlation-based deviation correction derived from correlation_matrix.

Arguments:
  • df: DataFrame containing at least the columns GridDay and MilkYieldInterp (values already interpolated onto the grid).
  • standard_lc: Standard lactation curve values indexed by grid day.
  • correlation_matrix: Correlation coefficients between grid-day yields (matrix-like, rows/columns ordered as the grid).
  • std_per_grid_day: 1-D array of standard deviations for each grid day.
Returns:

A float with the estimated 305-day milk yield.

Raises:
  • ValueError: if any measured day in df is not present in the expected global grid.
Notes:
  • The routine assumes fixed recording intervals of 20 days, except that the final interval (if day 290 is present) is 25 days.
  • The implementation expects a global grid variable to be defined and aligned with standard_lc and std_per_grid_day.
  • The amount of estimated milk yields depends on the availability of measured milk yields.
  • Lactation yield is: sum(known measurements (p) + estimated measurements (q)).
  • In the implementation according to the CRV E2 document, there should also be a correction factor for the difference between the expected and actual complete lactation yield of the previous lactation. However, this correction factor is not included in the current implementation as it requires additional data (previous lactation yield) that is not always available.
def ISLC_original( df: pandas.core.frame.DataFrame, days_in_milk_col: str | None = None, milking_yield_col: str | None = None, test_id_col: str | None = None, default_test_id: int = 0, standard_lc_305: pandas.core.series.Series | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]] = STANDARD_CURVE, correlation_matrix: pandas.core.frame.DataFrame = CORR_MATRIX, std_per_grid_day: numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]] = STDs, fit_standard_lc_from_data: bool = False, reference_df: pandas.core.frame.DataFrame | None = None) -> pandas.core.frame.DataFrame:
576def ISLC_original(
577    df: pd.DataFrame,
578    days_in_milk_col: str | None = None,
579    milking_yield_col: str | None = None,
580    test_id_col: str | None = None,
581    default_test_id: int = 0,
582    standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE,
583    correlation_matrix: pd.DataFrame = CORR_MATRIX,
584    std_per_grid_day: npt.NDArray[np.float64] = STDs,
585    fit_standard_lc_from_data: bool = False,
586    reference_df: pd.DataFrame | None = None,
587) -> pd.DataFrame:
588    """Compute estimated 305-day yields using the original grid-based ISLC variant
589    from the original paper by Dr. Wilmink in 1987.
590
591    The function groups standardized input by ``TestId``, interpolates to the
592    fixed GRID_DAYS grid, and applies :func:`ISLC_original_method` per lactation.
593
594    Args:
595       df: Input records with DIM and milk yield columns.
596       days_in_milk_col: Name of DIM column in ``df``.
597       milking_yield_col: Name of milk-yield column in ``df``.
598       test_id_col: Name of lactation identifier column in ``df``.
599       default_test_id: Fallback TestId when ``test_id_col`` is absent.
600       standard_lc_305: Standard lactation curve used for interpolation.
601       correlation_matrix: Correlation matrix over GRID_DAYS.
602       std_per_grid_day: Standard deviations per GRID_DAYS element.
603       fit_standard_lc_from_data: If True, refit standard curve and covariance
604           representation from ``reference_df`` before prediction.
605       reference_df: Reference population used when
606           ``fit_standard_lc_from_data=True``. The reference data is
607           standardized to ``DaysInMilk``/``MilkingYield`` and fit using
608           ``small_grid=True`` so correlation and standard-deviation assets are
609           aligned to ``GRID_DAYS``.
610
611    Returns:
612        DataFrame with columns ``LactationMilkYield`` and ``TestId``.
613
614    Raises:
615        ValueError: If required columns (DaysInMilk or MilkingYield) cannot be
616            found, or if ``fit_standard_lc_from_data=True`` and
617            ``reference_df`` is not provided.
618
619    Notes:
620        Records with DIM > 305 are dropped before interpolation and scoring.
621    """
622    # Standardize columns and filter DIM <= 305
623    df = standardize_lactation_columns(
624        df,
625        days_in_milk_col=days_in_milk_col,
626        milking_yield_col=milking_yield_col,
627        test_id_col=test_id_col,
628        default_test_id=default_test_id,
629        max_dim=305,
630    )
631
632    # Fit covariance/standard_lc if requested
633    if fit_standard_lc_from_data is True:
634        if reference_df is None:
635            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
636        reference_df = standardize_lactation_columns(
637            reference_df,
638            days_in_milk_col=days_in_milk_col,
639            milking_yield_col=milking_yield_col,
640            test_id_col=test_id_col,
641            default_test_id=default_test_id,
642            max_dim=305,
643        )
644        dim = reference_df["DaysInMilk"]
645        milk_yield = reference_df["MilkingYield"]
646        standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink")
647        standard_full_lc_305 = pd.Series(standard_full_lc_305)
648
649        # fit matrix
650        (
651            correlation_matrix,
652            std_per_grid_day,
653            standard_curve_fitted,
654        ) = create_standard_lc_representation(
655            df=reference_df,
656            standard_lactation_curve=pd.Series(standard_full_lc_305),
657            # reference_df is standardized above, so fixed names are required here
658            days_in_milk_col="DaysInMilk",
659            milking_yield_col="MilkingYield",
660            interpolation_method=interpolation_standard_lc,
661            small_grid=True,
662        )
663
664        # Align fitted curve with default internal convention (day 0 equals day 1).
665        standard_lc_305 = np.insert(
666            np.asarray(standard_curve_fitted, dtype=float),
667            0,
668            float(standard_curve_fitted.iloc[0]),
669        )
670    rows: list[dict[str, Any]] = []
671    for test_id, group in df.groupby("TestId", sort=False):
672        lactation = interpolation_standard_lc(
673            group,
674            days_in_milk_col="DaysInMilk",
675            milking_yield_col="MilkingYield",
676            standard_lc=standard_lc_305,
677        )
678        if lactation.empty:
679            rows.append({"TestId": test_id, "305_milk_yield": np.nan})
680            continue
681
682        cumulative_yield = ISLC_original_method(
683            df=lactation,
684            standard_lc=standard_lc_305,
685            correlation_matrix=correlation_matrix,
686            std_per_grid_day=std_per_grid_day,
687        )
688        rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield})
689
690    return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"]))

Compute estimated 305-day yields using the original grid-based ISLC variant from the original paper by Dr. Wilmink in 1987.

The function groups standardized input by TestId, interpolates to the fixed GRID_DAYS grid, and applies ISLC_original_method() per lactation.

Arguments:
  • df: Input records with DIM and milk yield columns.
  • days_in_milk_col: Name of DIM column in df.
  • milking_yield_col: Name of milk-yield column in df.
  • test_id_col: Name of lactation identifier column in df.
  • default_test_id: Fallback TestId when test_id_col is absent.
  • standard_lc_305: Standard lactation curve used for interpolation.
  • correlation_matrix: Correlation matrix over GRID_DAYS.
  • std_per_grid_day: Standard deviations per GRID_DAYS element.
  • fit_standard_lc_from_data: If True, refit standard curve and covariance representation from reference_df before prediction.
  • reference_df: Reference population used when fit_standard_lc_from_data=True. The reference data is standardized to DaysInMilk/MilkingYield and fit using small_grid=True so correlation and standard-deviation assets are aligned to GRID_DAYS.
Returns:

DataFrame with columns LactationMilkYield and TestId.

Raises:
  • ValueError: If required columns (DaysInMilk or MilkingYield) cannot be found, or if fit_standard_lc_from_data=True and reference_df is not provided.
Notes:

Records with DIM > 305 are dropped before interpolation and scoring.

def interpolation_standard_lc( group: pandas.core.frame.DataFrame, days_in_milk_col: str, milking_yield_col: str, standard_lc: pandas.core.series.Series | numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]] | None = None, small_grid: bool = False) -> pandas.core.frame.DataFrame:
703def interpolation_standard_lc(
704    group: pd.DataFrame,
705    days_in_milk_col: str,
706    milking_yield_col: str,
707    standard_lc: pd.Series | npt.NDArray[np.float64] | None = None,
708    small_grid: bool = False,
709) -> pd.DataFrame:
710    """Interpolate a single lactation onto the DIM grid guided by a standard curve.
711
712    The function interpolates measured yields for one animal onto the
713    predefined grid of DIM values. When a grid day coincides with a measured
714    DIM the observed yield is used; otherwise the interpolation adjusts the
715    linear slope between neighboring measurements by the difference in the
716    standard curve, inspired by the CRV E2 approach.
717
718    The interpolation formula applied for a grid day ``gday`` between two
719    measured points (x1,y1) and (x2,y2) is::
720
721        slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1)
722        yi = gi + (slope * (gday - x1) + (y1 - g1))
723
724    Args:
725        group: DataFrame with test-day records for a single animal. Must
726            contain columns named by ``days_in_milk_col`` and
727            ``milking_yield_col``.
728        days_in_milk_col: Name of the DIM column in ``group``.
729        milking_yield_col: Name of the milk-yield column in ``group``.
730        standard_lc: A Series indexed by DIM providing the expected yield on
731            each grid day; used to guide the interpolation shape.
732        small_grid: If True, interpolate on GRID_DAYS only; otherwise include
733            day 0 and day 305 in the interpolation target grid.
734
735    Returns:
736        A DataFrame with one row per interpolated grid day containing the
737        original identifying columns (from the first row of ``group``), plus
738        the columns ``GridDay`` and ``MilkYieldInterp``.
739
740    Notes:
741        - No interpolation is performed outside the range bounded by the
742          first and last measured DIM for the lactation (those grid days are
743          skipped).
744                - No interpolation is performed for grid days where the right
745                    neighboring measured DIM is >= 305, because this lies outside the
746                    intended standard-curve support.
747        - The implementation defines a local default ``grid`` of
748          [10, 30, ..., 290].
749
750
751    """
752    if small_grid:
753        grid = GRID_DAYS
754    else:
755        # add zero and 305 to the grid
756        grid = [0] + GRID_DAYS + [305]
757
758    if standard_lc is None:
759        raise ValueError("A standard lactation curve is required for interpolation_standard_lc.")
760
761    standard_lc_series = _curve_to_series(standard_lc)
762
763    # Sort and ensure unique DIMs
764    group = group.sort_values(days_in_milk_col).drop_duplicates(days_in_milk_col)
765    dims = group[days_in_milk_col].tolist()
766
767    rows = []
768
769    # loop over the grid days
770    for gday in grid:
771        # --- CASE 1: Exact measured day ---------------------------
772        if gday in dims:
773            y_val = group.loc[group[days_in_milk_col] == gday, milking_yield_col].iloc[0]
774
775            row = group.iloc[0].to_dict()
776            row["GridDay"] = gday
777            row["MilkYieldInterp"] = float(y_val)
778            rows.append(row)
779            continue
780
781        # --- CASE 2: interpolate between measured days -------------
782        before_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] < gday])
783        after_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] > gday])
784        before = before_df.tail(1)
785        after = after_df.head(1)
786
787        # CASE 3 No interpolation if grid day is outside the range of measured DIM
788        if before.empty or after.empty:
789            continue
790
791        # CASE 4: No interpolation if the day after is day 305 or higher,
792        # as this is outside the range of the standard curve
793        if float(after.iloc[0][days_in_milk_col]) >= 305:
794            continue
795
796        x1 = float(before.iloc[0][days_in_milk_col])
797        y1 = float(before.iloc[0][milking_yield_col])
798        x2 = float(after.iloc[0][days_in_milk_col])
799        y2 = float(after.iloc[0][milking_yield_col])
800
801        # expected yields from standard lactation curve
802        g1 = _curve_value(standard_lc_series, int(x1))
803        g2 = _curve_value(standard_lc_series, int(x2))
804        gi = _curve_value(standard_lc_series, int(gday))
805
806        # Wilmink-based interpolation formula
807        slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1)
808        yi = gi + (slope * (gday - x1) + (y1 - g1))
809
810        row = group.iloc[0].to_dict()
811        row["GridDay"] = gday
812        row["MilkYieldInterp"] = float(yi)
813        rows.append(row)
814
815    if not rows:
816        return pd.DataFrame(
817            columns=pd.Index([*group.columns.to_list(), "GridDay", "MilkYieldInterp"])
818        )
819
820    return pd.DataFrame(rows)

Interpolate a single lactation onto the DIM grid guided by a standard curve.

The function interpolates measured yields for one animal onto the predefined grid of DIM values. When a grid day coincides with a measured DIM the observed yield is used; otherwise the interpolation adjusts the linear slope between neighboring measurements by the difference in the standard curve, inspired by the CRV E2 approach.

The interpolation formula applied for a grid day gday between two measured points (x1,y1) and (x2,y2) is::

slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1)
yi = gi + (slope * (gday - x1) + (y1 - g1))
Arguments:
  • group: DataFrame with test-day records for a single animal. Must contain columns named by days_in_milk_col and milking_yield_col.
  • days_in_milk_col: Name of the DIM column in group.
  • milking_yield_col: Name of the milk-yield column in group.
  • standard_lc: A Series indexed by DIM providing the expected yield on each grid day; used to guide the interpolation shape.
  • small_grid: If True, interpolate on GRID_DAYS only; otherwise include day 0 and day 305 in the interpolation target grid.
Returns:

A DataFrame with one row per interpolated grid day containing the original identifying columns (from the first row of group), plus the columns GridDay and MilkYieldInterp.

Notes:
  • No interpolation is performed outside the range bounded by the first and last measured DIM for the lactation (those grid days are skipped). - No interpolation is performed for grid days where the right neighboring measured DIM is >= 305, because this lies outside the intended standard-curve support.
  • The implementation defines a local default grid of [10, 30, ..., 290].
def linear_interpd_all_to_grid( group: pandas.core.frame.DataFrame, days_in_milk_col: str, milking_yield_col: str, standard_lc: pandas.core.series.Series | None = None, small_grid: bool = False) -> pandas.core.frame.DataFrame:
823def linear_interpd_all_to_grid(
824    group: pd.DataFrame,
825    days_in_milk_col: str,
826    milking_yield_col: str,
827    standard_lc: pd.Series | None = None,
828    small_grid: bool = False,
829) -> pd.DataFrame:
830    """Linearly interpolate all grid days for a lactation.
831
832    This helper uses linear interpolation (with extrapolation) to produce
833    milk-yield values for every grid day regardless of whether the grid day
834    lies between measured observations.
835
836    Args:
837        group: DataFrame containing measured DIM and yield values for one
838            lactation.
839        days_in_milk_col: Name of the DIM column.
840        milking_yield_col: Name of the milk-yield column.
841        small_grid: If True, interpolate on GRID_DAYS only; otherwise include
842            day 0 and day 305.
843
844    Returns:
845        A DataFrame with identifying columns copied from ``group``'s first
846        row and columns ``GridDay`` and ``MilkYieldInterp`` containing the
847        interpolated (or extrapolated) yields for every grid day.
848    """
849    if small_grid:
850        grid = GRID_DAYS
851    else:
852        # add zero and 305 to the grid
853        grid = [0] + GRID_DAYS + [305]
854
855    group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col)
856    f = interp1d(
857        group[days_in_milk_col],
858        group[milking_yield_col],
859        kind="linear",
860        fill_value=cast(Any, "extrapolate"),
861    )
862
863    base = {
864        col: group[col].iloc[0]
865        for col in group.columns
866        if col not in [days_in_milk_col, milking_yield_col]
867    }
868    base["GridDay"] = grid
869    base["MilkYieldInterp"] = f(grid)
870
871    return pd.DataFrame(base)

Linearly interpolate all grid days for a lactation.

This helper uses linear interpolation (with extrapolation) to produce milk-yield values for every grid day regardless of whether the grid day lies between measured observations.

Arguments:
  • group: DataFrame containing measured DIM and yield values for one lactation.
  • days_in_milk_col: Name of the DIM column.
  • milking_yield_col: Name of the milk-yield column.
  • small_grid: If True, interpolate on GRID_DAYS only; otherwise include day 0 and day 305.
Returns:

A DataFrame with identifying columns copied from group's first row and columns GridDay and MilkYieldInterp containing the interpolated (or extrapolated) yields for every grid day.

def linear_interpd_closest_to_grid( group: pandas.core.frame.DataFrame, days_in_milk_col: str, milking_yield_col: str, standard_lc: pandas.core.series.Series | None = None, small_grid: bool = False) -> pandas.core.frame.DataFrame | None:
876def linear_interpd_closest_to_grid(
877    group: pd.DataFrame,
878    days_in_milk_col: str,
879    milking_yield_col: str,
880    standard_lc: pd.Series | None = None,
881    small_grid: bool = False,
882) -> pd.DataFrame | None:
883    """Linearly interpolate grid days between measured observations.
884
885    This helper returns interpolated yields only for grid days that lie
886    between the first and last measured DIM for the lactation. If no grid
887    days fall within the measured range the function returns ``None``.
888
889    Args:
890        group: DataFrame for a single lactation.
891        days_in_milk_col: Name of the DIM column.
892        milking_yield_col: Name of the milk-yield column.
893        small_grid: If True, use GRID_DAYS only; otherwise include
894            day 0 and day 305.
895
896    Returns:
897        A DataFrame with identifying columns plus ``GridDay`` and
898        ``MilkYieldInterp``, or ``None`` if there are no grid days within
899        the measured range.
900    """
901    if small_grid:
902        grid = GRID_DAYS
903    else:
904        # add zero and 305 to the grid
905        grid = [0] + GRID_DAYS + [305]
906
907    group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col)
908
909    # Find the range of interpolatable DIM
910    min_dim = group[days_in_milk_col].min()
911    max_dim = group[days_in_milk_col].max()
912    grid_in_range = [g for g in grid if min_dim <= g <= max_dim]
913
914    # apply linear interpolation
915    if not grid_in_range:
916        return None
917    f = interp1d(
918        group[days_in_milk_col],
919        group[milking_yield_col],
920        kind="linear",
921        fill_value=cast(Any, np.nan),
922    )
923
924    # create a new dataframe with the newly created columns
925    base = {
926        col: group[col].iloc[0]
927        for col in group.columns
928        if col not in [days_in_milk_col, milking_yield_col]
929    }
930    base["GridDay"] = grid_in_range
931    base["MilkYieldInterp"] = f(grid_in_range)
932
933    return pd.DataFrame(base)

Linearly interpolate grid days between measured observations.

This helper returns interpolated yields only for grid days that lie between the first and last measured DIM for the lactation. If no grid days fall within the measured range the function returns None.

Arguments:
  • group: DataFrame for a single lactation.
  • days_in_milk_col: Name of the DIM column.
  • milking_yield_col: Name of the milk-yield column.
  • small_grid: If True, use GRID_DAYS only; otherwise include day 0 and day 305.
Returns:

A DataFrame with identifying columns plus GridDay and MilkYieldInterp, or None if there are no grid days within the measured range.

def create_standard_lc_representation( df: pandas.core.frame.DataFrame, standard_lactation_curve: pandas.core.series.Series, days_in_milk_col: str, milking_yield_col: str, interpolation_method: InterpolationMethod = <function interpolation_standard_lc>, lc_model: str = 'Wilmink', small_grid: bool = False) -> tuple[pandas.core.frame.DataFrame, numpy.ndarray[tuple[typing.Any, ...], numpy.dtype[numpy.float64]], pandas.core.series.Series]:
 936def create_standard_lc_representation(
 937    df: pd.DataFrame,
 938    standard_lactation_curve: pd.Series,
 939    days_in_milk_col: str,
 940    milking_yield_col: str,
 941    interpolation_method: InterpolationMethod = interpolation_standard_lc,
 942    lc_model: str = "Wilmink",
 943    small_grid: bool = False,
 944) -> tuple[pd.DataFrame, npt.NDArray[np.float64], pd.Series]:
 945    """Create a standard lactation curve with correlation matrix from data.
 946
 947    This function estimates the correlation structure and standard deviations
 948    of milk yields across grid days, and refits a Wilmink lactation curve model to
 949    the interpolated data. The Wilmink model is hereby the default,
 950    but any model from the LC package can be used.
 951    The outputs can be used as inputs to the
 952    ``ISLC_method`` or ``ISLC_original`` functions for predicting 305-d yields.
 953
 954    The process standardizes yields per animal and computes row-wise
 955    (per-animal) statistics, then derives relationships across grid days.
 956
 957    Args:
 958        df: Input DataFrame containing test-day records. Must include a
 959            ``TestId`` column to identify unique lactations. Also requires
 960            columns specified by ``days_in_milk_col`` and ``milking_yield_col``.
 961        standard_lactation_curve: A Series providing a reference lactation
 962            curve (e.g., the fitted Wilmink curve on which to base
 963            interpolation).
 964        days_in_milk_col: Name of the column in ``df`` containing days in milk (DIM).
 965        milking_yield_col: Name of the column in ``df`` containing measured
 966            milk yields.
 967        interpolation_method: An interpolation method conforming to
 968            ``InterpolationMethod`` protocol (default: ``interpolation_standard_lc``).
 969            Must be callable with signature
 970            (group, days_in_milk_col, milking_yield_col, standard_lc).
 971        lc_model: The lactation curve model to fit to the interpolated data. Default is "Wilmink".
 972        small_grid: If True, build representation on GRID_DAYS only.
 973
 974    Returns:
 975        A tuple of three elements:
 976        - corr (pd.DataFrame): Correlation matrix between grid-day yields.
 977        - std_per_grid_day (np.ndarray): Standard deviations of standardized
 978          yields per grid day.
 979        - standard_lactation_curve_grid (pd.Series): Refitted Wilmink model
 980          indexed by DIM (1–305).
 981
 982    Notes:
 983        - Expects ``df`` to have a ``TestId`` column to group lactations.
 984        - Performs standardization per animal (row-wise), so output
 985          correlations and standard deviations reflect between-day variation
 986          within standardized animal profiles.
 987        - Uses ``fit_lactation_curve`` from the ``lactationcurve`` package
 988          with model='Wilmink' and fitting='frequentist'.
 989          However other models can be used by specifying the ``lc_model`` argument.
 990    """
 991
 992    if "TestId" not in df.columns:
 993        raise ValueError("DataFrame must contain a 'TestId' column.")
 994    if days_in_milk_col not in df.columns or milking_yield_col not in df.columns:
 995        raise ValueError(f"Columns '{days_in_milk_col}' and '{milking_yield_col}' must be in df.")
 996
 997    interpolated_groups: list[pd.DataFrame] = []
 998    for test_id, group in df.groupby("TestId", sort=False):
 999        try:
1000            interp_group = interpolation_method(
1001                group,
1002                days_in_milk_col=days_in_milk_col,
1003                milking_yield_col=milking_yield_col,
1004                standard_lc=standard_lactation_curve,
1005                small_grid=small_grid,
1006            )
1007        except TypeError:
1008            # Backward compatibility for custom interpolation callables
1009            # that do not yet accept small_grid.
1010            interp_group = interpolation_method(
1011                group,
1012                days_in_milk_col=days_in_milk_col,
1013                milking_yield_col=milking_yield_col,
1014                standard_lc=standard_lactation_curve,
1015            )
1016        if interp_group.empty:
1017            continue
1018        if "TestId" not in interp_group.columns:
1019            interp_group = interp_group.copy()
1020            interp_group["TestId"] = test_id
1021        interpolated_groups.append(interp_group)
1022
1023    if not interpolated_groups:
1024        raise ValueError("No interpolated rows were produced; cannot build representation.")
1025
1026    df_grid = pd.concat(interpolated_groups, ignore_index=True)
1027
1028    # define Znp
1029    # pivot df to make GridDay the columns
1030    # Z should be the unscaled pivot table (TestId × GridDay)
1031    Z = df_grid.pivot(index="TestId", columns="GridDay", values="MilkYieldInterp")
1032
1033    # Standardize records per cow (so each row has mean=0, std=1)
1034    Z_std = (Z.sub(Z.mean(axis=1), axis=0)).div(Z.std(axis=1), axis=0)
1035
1036    # Convert to NumPy array
1037    Znp = Z_std.to_numpy()
1038
1039    # calculate correlation matrix
1040    corr = pd.DataFrame(Znp, columns=Z.columns).corr()
1041
1042    # calculate standard deviations for each grid day from the Znp matrix
1043    std_per_grid_day = np.nanstd(Znp, axis=0)  # ignores NaN
1044
1045    # fit Wilmink model to get mean values and std for each cow
1046    standard_lactation_curve_grid = pd.Series(
1047        fit_lactation_curve(
1048            df_grid["GridDay"].values,
1049            df_grid["MilkYieldInterp"].values,
1050            model=lc_model,
1051            fitting="frequentist",
1052        ),
1053        index=range(1, 306),
1054    )
1055    return corr, std_per_grid_day, standard_lactation_curve_grid

Create a standard lactation curve with correlation matrix from data.

This function estimates the correlation structure and standard deviations of milk yields across grid days, and refits a Wilmink lactation curve model to the interpolated data. The Wilmink model is hereby the default, but any model from the LC package can be used. The outputs can be used as inputs to the ISLC_method or ISLC_original functions for predicting 305-d yields.

The process standardizes yields per animal and computes row-wise (per-animal) statistics, then derives relationships across grid days.

Arguments:
  • df: Input DataFrame containing test-day records. Must include a TestId column to identify unique lactations. Also requires columns specified by days_in_milk_col and milking_yield_col.
  • standard_lactation_curve: A Series providing a reference lactation curve (e.g., the fitted Wilmink curve on which to base interpolation).
  • days_in_milk_col: Name of the column in df containing days in milk (DIM).
  • milking_yield_col: Name of the column in df containing measured milk yields.
  • interpolation_method: An interpolation method conforming to InterpolationMethod protocol (default: interpolation_standard_lc). Must be callable with signature (group, days_in_milk_col, milking_yield_col, standard_lc).
  • lc_model: The lactation curve model to fit to the interpolated data. Default is "Wilmink".
  • small_grid: If True, build representation on GRID_DAYS only.
Returns:

A tuple of three elements:

  • corr (pd.DataFrame): Correlation matrix between grid-day yields.
  • std_per_grid_day (np.ndarray): Standard deviations of standardized yields per grid day.
  • standard_lactation_curve_grid (pd.Series): Refitted Wilmink model indexed by DIM (1–305).
Notes:
  • Expects df to have a TestId column to group lactations.
  • Performs standardization per animal (row-wise), so output correlations and standard deviations reflect between-day variation within standardized animal profiles.
  • Uses fit_lactation_curve from the lactationcurve package with model='Wilmink' and fitting='frequentist'. However other models can be used by specifying the lc_model argument.