lactationcurve.characteristics.best_predict

Best prediction for cumulative 305-day milk yield.

Purpose

The Best Prediction (BP) method estimates 305-day lactation yield from intermittent test-day records by predicting deviations from an expected lactation curve at all unobserved days in milk. The method uses standard lactation curves that represent the expected course of lactation for specific cow subgroups, such as breed or parity.

By incorporating these standard curves, BP accounts for the typical lactation pattern in which milk yield increases after calving, reaches a peak, and subsequently declines. Missing daily yields are estimated from the covariance structure of test-day records and their deviations from the expected curve.

This module implements the best-prediction approach described by VanRaden (1997) for ICAR Procedure 2, Section 2 (Computing Accumulated Lactation Yield).

Method Summary

Adapted from the Best Predict Manual by Cole and VanRaden (2015)

Best prediction combines a population-level standard lactation curve with correlation-based corrections derived from observed test-day deviations. The method projects observed deviations onto the full 305-day curve using a covariance structure estimated from reference data.

Individual daily yield can be modeled as the expected value of a management group plus a deviation from that mean: yi = E(yi) + ti where yi is an individual yield on test day i, E(yi) is the expected yield for an animal in the same management group (Wiggans, Misztal, and Van Vleck 1988) on the same test day, and ti is a deviation from the group mean on the same test day. Suppose that µ is a vector of expected values for each day of lactation for a single trait, t is a vector of 305 test day deviations for the trait, and tm is a vector of only the measured deviations. The means and variances of t and tm are assumed known with V (t) = V and V (tm) = Vm. The covariance between t and tm, C, is also assumed known.

Lactation yield: A cow’s true 305-d yield (y) is the sum of the expected values for each day (1′µ) plus the sum of her 305 deviations from expectations (1′t), where 1′ is a vector of 1s of length 305. The cow’s true yield, and the best prediction of that yield (yˆ), are: y = 1′t yˆ = 1′CVm^−1 * tm

Key Entry Points

best_predict_method: Apply best prediction per TestId.
best_predict_method_single_lac: Predict one lactation.
fit_autocorrelation_matrix: Fit covariance structure from reference data.

Column Flexibility

The functions accept several case-insensitive column name aliases and can create a default TestId if one is missing. Recognized aliases:

Days in Milk: ["daysinmilk", "dim", "testday"]
Milk Yield: ["milkingyield", "testdaymilkyield", "milkyield", "yield"]
Test Id: ["animalid", "testid", "id"]

It is also possible to provide your own column names so the function can be applied to dataframes with different column naming conventions.

Defaults

STANDARD_CURVE: Baseline expected lactation curve for days 1..305 (Wood).
COV_MATRIX: Default day-to-day covariance structure used for projection.

Notes

The default assets are loaded from the package data directory.
Users can fit curve and covariance ingredients from their own reference population.
The method can be applied to lactations without any measurements, in which case the result will be the population mean from the standard curve.
Predicted milk yields have less variance than true milk yields. With TIM, estimated yields have more variance than true yields. The reason is that predicted yields are regressed toward the mean unless all 305 daily yields are observed.
Currently it is not yet possible to predict lactation yields for lactation windows other than 305 days, but this is on the roadmap for future updates.
The method currently assumes that the standard curves and covariance structure is the same for all lactations, but future updates may allow using different standard curves and covariance structures for different subgroups of lactations (e.g., by breed or parity).
For the used standard lactation curve currently the Wood LC model is used, it is possible to use other methods aswell.
Strengths of Best Prediction includes its ability to leverage the full covariance structure of the lactation curve and to therefore potentially provide more accurate predictions especially for lactations with few test days. This is because the inherent shape of the lactation curve is taken into account in the projection of observed deviations to unobserved days. It has fewer in between steps then ISLC and is therefore easier to use.
Disadvantages of Best Prediction include its computational intensity, especially when fitting the covariance structure from data. And the method is not as easy to understand as a simpler method such as the test interval method, which can make it less transparent to users. The best results are obtained when the standard curve and covariance matrix are from the same population as the data, which can be a barrier for users without access to a large reference dataset. This also causes inconsistencies in cumulative milk yield results depending on which standard curves are used. A cow with the exact same test-day records can have a different cumulative milk yield estimates depending on the standard curve used, which can be considered unfair.

Still coming functionality

Milk, fat, and protein yields can be processed separately using single-trait best prediction or jointly using multi-trait best prediction. Replacement of the measured milk yields by the fat or protein yields gives the single-trait predictions for fat or for protein. Multi-trait predictions require larger vectors and matrices but similar algebra.

References

VanRaden, P. M. (1997). Lactation yields and accuracies computed from test day yields and (co) variances by best prediction. Journal of dairy science, 80(11), 3015-3022.

A Manual for Use of BESTPRED: A Program for Estimation of Lactation Yield and Persistency Using Best Prediction Release 2.0 rc 7 J. B. Cole and P. M. VanRaden August 12, 2009 Revised April 27, 2015 Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Room 306 Bldg 005 BARC-West, 10300 Baltimore Avenue, Beltsville, MD 20705-2350

Original code for best predict can be found on GitHub

In packages\models\lactationcurve otebooks you can find an example notebook that shows how to fit your own standard curve and covariance matrix from a reference dataset and how to apply the best prediction method to a test dataset.

Author: Meike van Leerdam, Date: 24-04-2026 Last update: 21-May-2026

View Source

  1"""Best prediction for cumulative 305-day milk yield.
  2
  3Purpose
  4-------
  5The Best Prediction (BP) method estimates 305-day lactation yield from
  6intermittent test-day records by predicting deviations from an expected
  7lactation curve at **all** unobserved days in milk. The method uses standard
  8lactation curves that represent the expected course of lactation for
  9specific cow subgroups, such as breed or parity.
 10
 11By incorporating these standard curves, BP accounts for the typical
 12lactation pattern in which milk yield increases after calving, reaches a
 13peak, and subsequently declines. Missing daily yields are estimated from
 14the covariance structure of test-day records and their deviations from the
 15expected curve.
 16
 17This module implements the best-prediction approach described by
 18VanRaden (1997) for ICAR Procedure 2, Section 2 (Computing Accumulated
 19Lactation Yield).
 20
 21Method Summary
 22--------------
 23Adapted from the Best Predict Manual by Cole and VanRaden (2015)
 24
 25Best prediction combines a population-level standard lactation curve with
 26correlation-based corrections derived from observed test-day deviations.
 27The method projects observed deviations onto the full 305-day curve using a
 28covariance structure estimated from reference data.
 29
 30Individual daily yield can be modeled as the expected value of a management
 31group plus a deviation from that mean:
 32yi = E(yi) + ti
 33where yi
 34is an individual yield on test day i, E(yi) is the expected yield for an
 35animal in the same management group
 36(Wiggans, Misztal, and Van Vleck 1988) on the same test day, and ti
 37is a deviation from the group mean on the same
 38test day. Suppose that µ is a vector of expected values for each day of
 39lactation for a single trait, t is a vector of 305
 40test day deviations for the trait, and tm is a vector of only the measured
 41deviations. The means and variances of t
 42and tm are assumed known with V (t) = V and V (tm) = Vm. The covariance
 43between t and tm, C, is also assumed
 44known.
 45
 46**Lactation yield**: A cow’s true 305-d yield (y) is the sum of the expected
 47values for each day (1′µ) plus the sum of her
 48305 deviations from expectations (1′t), where 1′
 49is a vector of 1s of length 305. The cow’s true yield, and the best
 50prediction of that yield (yˆ), are:
 51y = 1′t
 52yˆ = 1′CVm^−1 * tm
 53
 54
 55Key Entry Points
 56----------------
 57- ``best_predict_method``: Apply best prediction per ``TestId``.
 58- ``best_predict_method_single_lac``: Predict one lactation.
 59- ``fit_autocorrelation_matrix``: Fit covariance structure from reference data.
 60
 61Column Flexibility
 62------------------
 63The functions accept several case-insensitive column name aliases and can
 64create a default ``TestId`` if one is missing. Recognized aliases:
 65
 66- Days in Milk: `["daysinmilk", "dim", "testday"]`
 67- Milk Yield: `["milkingyield", "testdaymilkyield", "milkyield", "yield"]`
 68- Test Id: `["animalid", "testid", "id"]`
 69
 70It is also possible to provide your own column names so the function
 71can be applied to dataframes with different column naming conventions.
 72
 73Defaults
 74--------
 75- ``STANDARD_CURVE``: Baseline expected lactation curve for days 1..305 (Wood).
 76- ``COV_MATRIX``: Default day-to-day covariance structure used for projection.
 77
 78Notes
 79-----
 80- The default assets are loaded from the package ``data`` directory.
 81- Users can fit curve and covariance ingredients from their own reference
 82    population.
 83- The method can be applied to lactations without any measurements,
 84    in which case the result will be the population mean from the standard curve.
 85- Predicted milk yields have less variance than true milk yields. With TIM,
 86estimated yields have more variance than true yields. The reason is that predicted yields are
 87regressed toward the mean unless all 305 daily yields are observed.
 88- Currently it is not yet possible to predict lactation yields for lactation windows
 89    other than 305 days, but this is on the roadmap for future updates.
 90- The method currently assumes that the standard curves and covariance structure is the same
 91    for all lactations,
 92    but future updates may allow using different standard curves and covariance structures for
 93    different subgroups of lactations (e.g., by breed or parity).
 94- For the used standard lactation curve currently the Wood LC model is used, it is possible to
 95    use other methods aswell.
 96- Strengths of Best Prediction includes its ability to leverage the full covariance structure
 97    of the lactation curve and to therefore potentially provide more accurate predictions
 98    especially for lactations with few test days. This is because the inherent shape of the
 99    lactation curve is taken into account in the projection of
100    observed deviations to unobserved days.
101    It has fewer in between steps then ISLC and is therefore easier to use.
102- Disadvantages of Best Prediction include its computational intensity,
103    especially when fitting the covariance structure from data.
104    And the method is not as easy to understand
105    as a simpler method such as the test interval method,
106    which can make it less transparent to users.
107    The best results are obtained when the standard curve and covariance matrix
108    are from the same population as the data,
109    which can be a barrier for users without access to a large reference dataset.
110    This also causes inconsistencies in cumulative milk yield results
111    depending on which standard curves are used.
112    A cow with the exact same test-day records
113    can have a different cumulative milk yield estimates depending
114    on the standard curve used, which can be considered unfair.
115
116Still coming functionality
117--------------------------
118- Milk, fat, and protein yields can be processed separately using single-trait best
119    prediction or jointly using multi-trait best prediction.
120    Replacement of the measured milk yields by the fat or protein yields gives the
121    single-trait predictions for fat or for protein.
122    Multi-trait predictions require larger vectors and matrices but similar algebra.
123
124
125
126References
127---------
128VanRaden, P. M. (1997). Lactation yields and accuracies computed from test
129day yields and (co) variances by best prediction.
130Journal of dairy science, 80(11), 3015-3022.
131
132A Manual for Use of BESTPRED: A Program for Estimation of Lactation Yield
133and Persistency Using Best Prediction
134Release 2.0 rc 7
135J. B. Cole and P. M. VanRaden
136August 12, 2009
137Revised April 27, 2015
138Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States
139Department of Agriculture, Room 306 Bldg 005 BARC-West, 10300 Baltimore Avenue,
140Beltsville, MD 20705-2350
141
142Original code for best predict can be found on [GitHub](https://github.com/wintermind/bestpred)
143
144In [packages\models\lactationcurve\notebooks](https://github.com/Bovi-analytics/bovi/tree/main/packages/models/lactationcurve/notebooks)
145you can find an example notebook that shows how to
146fit your own standard curve and covariance matrix from a reference dataset and how to apply the
147best prediction method to a test dataset.
148
149Author: Meike van Leerdam,
150Date: 24-04-2026
151Last update: 21-May-2026
152"""
153
154import inspect
155from pathlib import Path
156from typing import cast
157
158import numpy as np
159import pandas as pd
160from scipy.linalg import LinAlgError, cho_factor, cho_solve
161from scipy.optimize import minimize
162
163from lactationcurve.fitting import fit_lactation_curve
164from lactationcurve.preprocessing import standardize_lactation_columns
165
166# get the standard lactation curve ingredients back from the data storage:
167DATA_DIR = Path(__file__).resolve().parent / "data"
168COV_MATRIX = np.load(DATA_DIR / "covariance_matrix_best_predict.npy")
169STANDARD_CURVE = np.load(DATA_DIR / "standard_lc_wood.npy")
170
171
172# Lightweight repr-only wrapper for cleaner generated signatures in docs
173class _DocDefault:
174    def __init__(self, label: str) -> None:
175        self.label = label
176
177    def __repr__(self) -> str:  # pragma: no cover - docs-only
178        return self.label
179
180
181_DOC_STANDARD_CURVE = _DocDefault("STANDARD_CURVE")
182_DOC_COV_MATRIX = _DocDefault("COV_MATRIX")
183
184# functions to fit you own standard curve and covariance matrix
185
186
187def pivot_milk_recordings_to_matrix(df: pd.DataFrame) -> np.ndarray:
188    """Convert long-format recordings to a fixed 305-day matrix.
189
190    Rows represent lactations (``TestId``) and columns represent days in milk
191    from 1 through 305. Missing observations are kept as ``NaN``.
192
193    Args:
194        df: Dataframe with ``TestId``, ``DaysInMilk``, and ``MilkingYield``.
195
196    Returns:
197        A NumPy array of shape ``(n_lactations, 305)``.
198    """
199    # ensure sorting
200    df = df.sort_values(["TestId", "DaysInMilk"])
201
202    # pivot to wide format
203    milk_recordings_pivot = df.pivot_table(
204        index="TestId", columns="DaysInMilk", values="MilkingYield"
205    )
206
207    # enforce fixed 305-day grid alignment used by best-prediction
208    milk_recordings_pivot = milk_recordings_pivot.reindex(columns=range(1, 306))
209
210    # convert to numpy matrix
211    Y = milk_recordings_pivot.to_numpy()
212    return Y
213
214
215def fit_standard_lc(df: pd.DataFrame, lc_model: str = "Wood") -> np.ndarray:
216    """Fit a population-level standard lactation curve.
217
218    The curve is fit with the package's frequentist Wood model and returned on
219    the fixed day-1..305 grid.
220
221    Args:
222        df: Reference dataframe containing ``DaysInMilk`` and ``MilkingYield``.
223        lc_model: The lactation curve model to fit.
224        Default is the "Wood" lactation curve model.
225
226    Returns:
227        A NumPy array of expected daily milk yield for days 1..305.
228
229    Notes:
230        This mean curve acts as the baseline in best prediction. Individual
231        lactations are represented as deviations around this population profile.
232    """
233    standard_lc = pd.Series(
234        fit_lactation_curve(
235            df["DaysInMilk"].values,
236            df["MilkingYield"].values,
237            model=lc_model,
238            fitting="frequentist",
239        ),
240        index=range(1, 306),
241    )
242
243    return standard_lc.to_numpy(dtype=float)
244
245
246def center_lactation_data(
247    milk_matrix: np.ndarray,
248    standard_lc: np.ndarray,
249    day_mean_method: str = "standard_lc",
250) -> np.ndarray:
251    """Center lactation yields before covariance estimation.
252
253    Args:
254        milk_matrix: Yield matrix with lactations in rows and days in columns.
255        standard_lc: Expected day-wise milk yield profile.
256        day_mean_method: Mean-centering strategy. Supported values are
257            ``"standard_lc"`` (default) and ``"data"``.
258
259    Returns:
260        A centered matrix with the same shape as ``milk_matrix``.
261
262    Raises:
263        ValueError: If ``day_mean_method`` is not supported.
264    """
265    if day_mean_method == "standard_lc":
266        day_mean = standard_lc
267    elif day_mean_method == "data":
268        day_mean = np.nanmean(milk_matrix, axis=0)
269    else:
270        raise ValueError("day_mean_method must be 'standard_lc' or 'data'.")
271
272    return milk_matrix - day_mean
273
274
275def build_covariance_matrix(rho: float, size: int) -> np.ndarray:
276    """Construct a covariance matrix.
277
278    Cole et al. (2007) estimated correlations among test-day yields using a
279    simplified model with an identity matrix (I) for daily measurement error
280    and an autoregressive matrix (E) for biological change. E is defined as
281    ``Eij = r ** |i-j|`` where ``i`` and ``j`` are test-day DIM and
282    ``0 < r < 1``.
283
284    Element ``(i, j)`` is ``rho ** abs(i - j)``.
285
286    Args:
287        rho: AR(1) correlation parameter.
288        size: Matrix dimension.
289
290    Returns:
291        A ``(size, size)`` AR(1) correlation matrix.
292    """
293    idx = np.arange(size)
294    M = np.abs(idx[:, None] - idx[None, :])
295    return rho**M
296
297
298def fit_autocorrelation_matrix(
299    df: pd.DataFrame, standard_lc: np.ndarray
300) -> dict[str, np.ndarray | float]:
301    """Estimate covariance parameters for best prediction.
302
303    The model is ``B = b1 * I + b2 * E`` where ``E`` is an AR(1) correlation
304    matrix. Parameters are optimized in transformed space and mapped back to
305    enforce ``b1 > 0``, ``b2 > 0``, and ``0 < rho < 1``.
306
307    Args:
308        df: Reference milk-recording dataframe.
309        standard_lc: Population mean curve used for centering.
310
311    Returns:
312        Dictionary with:
313        - ``"B_hat"``: fitted covariance matrix.
314        - ``"R_hat"``: correlation matrix derived from ``B_hat``.
315        - ``"b1"``, ``"b2"``, ``"rho"``: fitted scalar parameters.
316    """
317    milk_matrix = pivot_milk_recordings_to_matrix(df)
318    centered_matrix = center_lactation_data(milk_matrix, standard_lc)
319    n_lactations, n_days = centered_matrix.shape
320    observed_indices = [np.where(~np.isnan(centered_matrix[i]))[0] for i in range(n_lactations)]
321
322    def negative_log_likelihood(params: np.ndarray) -> float:
323        p_b1, p_b2, p_rho = params
324        b1 = float(np.exp(p_b1))
325        b2 = float(np.exp(p_b2))
326        rho = float(1 / (1 + np.exp(-p_rho)))  # now rho in (0,1)
327        correlation_matrix = build_covariance_matrix(rho, n_days)
328
329        total = 0.0
330        for lactation_idx, day_indices in enumerate(observed_indices):
331            observation_count = len(day_indices)
332            if observation_count == 0:
333                continue
334
335            observations = centered_matrix[lactation_idx, day_indices]
336            correlation_subset = correlation_matrix[np.ix_(day_indices, day_indices)]
337            sigma = b1 * np.eye(observation_count) + b2 * correlation_subset
338
339            # Numerical safeguards: try Cholesky and penalize non-PD parameters.
340            try:
341                cholesky_factor, lower = cho_factor(sigma, check_finite=False)
342                solution = cho_solve((cholesky_factor, lower), observations, check_finite=False)
343            except LinAlgError:
344                # penalty for non-PD
345                return float(1e12 + np.sum(np.abs(params)))
346
347            quadratic_form = float(observations @ solution)
348            log_determinant = 2.0 * np.sum(np.log(np.diag(cholesky_factor)))
349            total += 0.5 * (
350                log_determinant + quadratic_form + observation_count * np.log(2 * np.pi)
351            )
352
353        # return total negative log-likelihood
354        return float(total)
355
356    # initial guesses and optimization. A 50/50 split in variance is assumed as starting point
357    initial_variance = max(float(np.nanvar(centered_matrix)), 1e-6)
358    initial_params = [
359        np.log(0.5 * initial_variance),
360        np.log(0.5 * initial_variance),
361        0.5,
362    ]
363
364    result = minimize(
365        negative_log_likelihood,
366        x0=initial_params,
367        method="L-BFGS-B",
368        options={"maxiter": 2000, "ftol": 1e-8},
369    )
370
371    if not result.success:
372        print(f"Optimization warning: {result.message}")
373
374    log_b1_hat, log_b2_hat, logit_rho_hat = result.x
375    b1_hat = float(np.exp(log_b1_hat))
376    b2_hat = float(np.exp(log_b2_hat))
377    rho_hat = float(1 / (1 + np.exp(-logit_rho_hat)))
378    correlation_matrix = build_covariance_matrix(rho_hat, n_days)
379    covariance_matrix = b1_hat * np.eye(n_days) + b2_hat * correlation_matrix
380
381    # convert to correlation matrix
382    std = np.sqrt(np.diag(covariance_matrix))
383    correlation_matrix = covariance_matrix / np.outer(std, std)
384
385    return {
386        "B_hat": covariance_matrix,
387        "R_hat": correlation_matrix,
388        "b1": b1_hat,
389        "b2": b2_hat,
390        "rho": rho_hat,
391    }
392
393
394# Functions for best predict that also work with the provided standard curve and covariance matrix.
395
396
397def preprocess_measured_data(lactation: pd.DataFrame, standard_lc: np.ndarray) -> pd.Series:
398    """Build a 305-day deviation vector for a single lactation.
399
400    For observed days, this computes ``MilkingYield - standard_lc[day]``.
401    The result is reindexed to days 1..305 with unobserved days filled as zero.
402
403    Args:
404        lactation: Single-lactation dataframe with ``DaysInMilk`` and
405            ``MilkingYield``.
406        standard_lc: Expected daily milk yield profile.
407
408    Returns:
409        A Series indexed by day 1..305 containing milk-yield deviations.
410    """
411
412    # calculate the difference between the expected (population mean) and measured milk yield
413
414    # extract the expected milk yields for the measured DaysInMilk in the df
415    day_idx = lactation["DaysInMilk"].to_numpy(dtype=int) - 1
416    expected = np.asarray(standard_lc, dtype=float)[day_idx]
417
418    # Subtract
419    lactation["MilkDifference"] = lactation["MilkingYield"].to_numpy(dtype=float) - expected
420
421    # Create a Series of length 305 with missing values = 0
422    milk_difference = cast(pd.Series, lactation.set_index("DaysInMilk")["MilkDifference"])
423    corrected_series = milk_difference.reindex(range(1, 306), fill_value=0)
424
425    return corrected_series
426
427
428def best_predict_method_single_lac(
429    lactation: pd.DataFrame,
430    standard_lc: np.ndarray = STANDARD_CURVE,
431    covariance_matrix: np.ndarray = COV_MATRIX,
432) -> float:
433    """Predict 305-day cumulative yield for one lactation.
434
435    Observed test-day deviations are projected over all 305 days using the
436    covariance structure and then added to the baseline cumulative standard
437    curve.
438
439    By default this function uses the package-provided standard curve and covariance matrix.
440    But it is also possible to provide your own standard curve and covariance matrix,
441    for example when you want to fit these ingredients from your own reference population.
442
443
444    Args:
445        lactation: Observed records for one lactation.
446        standard_lc: Population mean daily yield profile.
447        covariance_matrix: Day-to-day covariance matrix on the 305-day grid.
448
449    Returns:
450        Predicted cumulative 305-day milk yield.
451
452    Notes:
453        Duplicate day records are resolved with ``keep="last"`` before
454        prediction. If no valid observations remain in days 1..305, the method
455        returns the cumulative standard curve.
456    """
457    filtered_lactation = lactation.loc[
458        (lactation["DaysInMilk"] >= 1) & (lactation["DaysInMilk"] <= 305)
459    ].copy()
460    filtered_lactation = filtered_lactation.drop_duplicates(subset=["DaysInMilk"], keep="last")
461    filtered_lactation = filtered_lactation.sort_values("DaysInMilk")
462
463    corrected_series = preprocess_measured_data(
464        filtered_lactation,
465        standard_lc=standard_lc,
466    )
467
468    if filtered_lactation.empty:
469        return float(np.sum(standard_lc))
470
471    obs_idx_1based = filtered_lactation["DaysInMilk"].to_numpy(dtype=int)  # DaysInMilk: 1-305
472    obs_idx_0based = obs_idx_1based - 1  # Convert to 0-based matrix indices: 0-304
473    y_obs = corrected_series.loc[obs_idx_1based].to_numpy(
474        dtype=float
475    )  # corrected_series is indexed by DaysInMilk (1-305)
476
477    # Extract covariance blocks
478    B_oo = covariance_matrix[
479        np.ix_(obs_idx_0based, obs_idx_0based)
480    ]  # Use 0-based indices for matrix
481    B_mo = covariance_matrix[:, obs_idx_0based]  # Use 0-based indices for matrix
482
483    # solve
484    c, lower = cho_factor(B_oo)
485    alpha = cho_solve((c, lower), y_obs)
486
487    # Predict full deviation curve
488    y_estimate = B_mo @ alpha
489
490    # Total milk = baseline + deviation
491    deviation = np.sum(y_estimate)
492
493    total = np.sum(standard_lc) + deviation
494
495    return total
496
497
498def best_predict_method(
499    df: pd.DataFrame,
500    standard_lc: np.ndarray = STANDARD_CURVE,
501    days_in_milk_col: str | None = None,
502    milking_yield_col: str | None = None,
503    test_id_col: str | None = None,
504    default_test_id: int = 0,
505    covariance_matrix: np.ndarray | None = COV_MATRIX,
506    fit_standard_lc_from_data: bool = False,
507    reference_df: pd.DataFrame | None = None,
508) -> pd.DataFrame:
509    """Apply best prediction to one or more lactations.
510
511    By default this function uses the package-provided standard curve and covariance matrix.
512    But it is also possible to provide your own standard curve and covariance matrix,
513    for example when you want to fit these ingredients from your own reference population.
514    This can be done in two ways: either by fitting the covariance matrix and standard curve
515    directly from a reference dataset by providing a pandas dataframe at 'reference_df ='
516    when ``fit_standard_lc_from_data`` is True.
517    Alternative for customization is to set standard_lc_305 and covariance_matrix
518    directly in the function call.
519
520    Args:
521        df: Input observations. If ``TestId`` is missing, all rows are treated
522            as one lactation.
523        standard_lc: Expected daily milk yield lactation curve on days 1..305.
524            If not provided, the package's default curve is used.
525            Or fit your own standard curve from a reference dataset by
526            providing a pandas dataframe at 'reference_df ='
527            when ``fit_standard_lc_from_data`` is True.
528        days_in_milk_col: Optional input column name for days in milk. If
529            provided, it is mapped to ``DaysInMilk``.
530        milking_yield_col: Optional input column name for milk yield. If
531            provided, it is mapped to ``MilkingYield``.
532        test_id_col: Optional input column name for lactation/test identifier.
533            If provided, it is mapped to ``TestId``.
534        default_test_id: Fallback test id used when no test-id column is
535            available.
536        covariance_matrix: Optional prefit covariance matrix. If omitted,
537            the default matrix is used or
538             ``reference_df`` can be used to fit one for your own data.
539        fit_standard_lc_from_data: Whether to fit covariance information from
540            ``reference_df`` instead of using a provided covariance matrix.
541        reference_df: Reference dataframe used when ``covariance_matrix`` and
542            ``standard_lc`` are not provided and ``fit_standard_lc_from_data``
543            is True.
544
545    Returns:
546        Dataframe with columns ``TestId`` and ``LactationMilkYield``.
547
548    Raises:
549        ValueError: If neither ``covariance_matrix`` nor ``reference_df`` is
550            provided.
551    """
552    # Standardize columns and filter DIM <= 305
553    df = standardize_lactation_columns(
554        df,
555        days_in_milk_col=days_in_milk_col,
556        milking_yield_col=milking_yield_col,
557        test_id_col=test_id_col,
558        default_test_id=default_test_id,
559        max_dim=305,
560    )
561
562    # Fit covariance if not provided
563    if fit_standard_lc_from_data:
564        if reference_df is None:
565            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
566        reference_df = standardize_lactation_columns(
567            reference_df,
568            days_in_milk_col=days_in_milk_col,
569            milking_yield_col=milking_yield_col,
570            test_id_col=test_id_col,
571            default_test_id=default_test_id,
572            max_dim=305,
573        )
574        covariance_matrix = cast(
575            np.ndarray, fit_autocorrelation_matrix(reference_df, standard_lc)["B_hat"]
576        )
577
578    covariance_matrix_array = cast(np.ndarray, covariance_matrix)
579
580    df = df.copy()
581
582    results = []
583
584    for test_id, lactation in df.groupby("TestId"):
585        pred = best_predict_method_single_lac(
586            lactation,
587            standard_lc,
588            covariance_matrix_array,
589        )
590        results.append({"TestId": test_id, "LactationMilkYield": pred})
591
592    return pd.DataFrame(results)
593
594
595# demo function so I can see if this script runs as expected
596
597
598def demo() -> None:
599    """Run a minimal example of best prediction with mock data."""
600
601    # --- Single + multiple lactations example ---
602    test_df = pd.DataFrame(
603        {
604            "TestId": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
605            "DaysInMilk": [10, 20, 30, 40, 50, 15, 25, 35, 45, 55],
606            "MilkingYield": [30, 35, 40, 38, 36, 28, 33, 37, 39, 34],
607        }
608    )
609
610    result_cov = best_predict_method(
611        test_df, standard_lc=STANDARD_CURVE, covariance_matrix=COV_MATRIX
612    )
613
614    print("Predictions with provided covariance matrix:")
615    print(result_cov)
616
617
618def _set_doc_signatures() -> None:
619    """Override displayed defaults in docs without changing runtime behavior."""
620    doc_defaults = {
621        "standard_lc": _DOC_STANDARD_CURVE,
622        "covariance_matrix": _DOC_COV_MATRIX,
623    }
624
625    for func in (best_predict_method_single_lac, best_predict_method):
626        signature = inspect.signature(func)
627        params = [
628            param.replace(default=doc_defaults[param.name]) if param.name in doc_defaults else param
629            for param in signature.parameters.values()
630        ]
631        func.__signature__ = signature.replace(parameters=params)
632
633
634_set_doc_signatures()
635
636if __name__ == "__main__":
637    demo()

DATA_DIR = PosixPath('/home/runner/work/bovi/bovi/packages/models/lactationcurve/src/lactationcurve/characteristics/data')

COV_MATRIX = array([[75.16786 , 67.28662 , 67.05099 , ..., 23.40674 , 23.32477 , 23.243088], [67.28662 , 75.16786 , 67.28662 , ..., 23.488997, 23.40674 , 23.32477 ], [67.05099 , 67.28662 , 75.16786 , ..., 23.571543, 23.488997, 23.40674 ], ..., [23.40674 , 23.488997, 23.571543, ..., 75.16786 , 67.28662 , 67.05099 ], [23.32477 , 23.40674 , 23.488997, ..., 67.28662 , 75.16786 , 67.28662 ], [23.243088, 23.32477 , 23.40674 , ..., 67.05099 , 67.28662 , 75.16786 ]], shape=(305, 305), dtype=float32)

STANDARD_CURVE = array([20.88852108, 24.28756033, 26.48529607, 28.13333695, 29.45700725, 30.56351807, 31.51304545, 32.34304569, 33.0785599 , 33.7372317 , 34.33200401, 34.8726797 , 35.36687887, 35.82065388, 36.23889995, 36.62563841, 36.98421725, 37.31745679, 37.62775749, 37.91718122, 38.18751364, 38.44031259, 38.67694623, 38.89862343, 39.10641824, 39.3012897 , 39.4840982 , 39.6556189 , 39.81655295, 39.9675369 , 40.10915064, 40.24192411, 40.36634304, 40.48285392, 40.59186822, 40.69376607, 40.78889947, 40.87759509, 40.96015672, 41.03686741, 41.1079914 , 41.17377578, 41.23445203, 41.29023728, 41.34133561, 41.387939 , 41.4302284 , 41.46837449, 41.50253857, 41.53287316, 41.55952271, 41.58262416, 41.60230747, 41.61869609, 41.63190741, 41.64205315, 41.64923978, 41.65356877, 41.65513699, 41.65403694, 41.65035703, 41.64418185, 41.63559233, 41.62466604, 41.61147731, 41.59609742, 41.57859483, 41.55903524, 41.53748181, 41.51399526, 41.48863401, 41.46145429, 41.43251026, 41.40185408, 41.36953608, 41.33560475, 41.30010691, 41.26308776, 41.22459095, 41.18465864, 41.1433316 , 41.10064924, 41.0566497 , 41.01136988, 40.9648455 , 40.91711116, 40.86820038, 40.81814563, 40.76697841, 40.71472924, 40.66142775, 40.60710269, 40.55178196, 40.49549263, 40.43826102, 40.38011269, 40.32107247, 40.2611645 , 40.20041226, 40.13883855, 40.0764656 , 40.01331499, 39.94940774, 39.88476433, 39.81940467, 39.75334817, 39.68661372, 39.61921973, 39.55118415, 39.48252446, 39.41325771, 39.34340051, 39.27296908, 39.20197922, 39.13044635, 39.05838551, 38.9858114 , 38.91273835, 38.83918033, 38.76515102, 38.69066375, 38.61573155, 38.54036713, 38.46458294, 38.38839112, 38.31180353, 38.23483177, 38.15748719, 38.07978086, 38.00172363, 37.92332608, 37.84459857, 37.76555126, 37.68619404, 37.60653661, 37.52658847, 37.4463589 , 37.36585698, 37.28509161, 37.20407148, 37.12280513, 37.04130088, 36.95956691, 36.8776112 , 36.7954416 , 36.71306576, 36.6304912 , 36.54772528, 36.46477519, 36.381648 , 36.29835062, 36.21488984, 36.13127228, 36.04750445, 35.96359274, 35.87954339, 35.79536253, 35.71105615, 35.62663015, 35.54209029, 35.45744224, 35.37269153, 35.2878436 , 35.20290379, 35.11787732, 35.03276932, 34.94758481, 34.86232874, 34.77700593, 34.69162112, 34.60617898, 34.52068406, 34.43514083, 34.3495537 , 34.26392697, 34.17826487, 34.09257154, 34.00685105, 33.9211074 , 33.83534449, 33.74956618, 33.66377623, 33.57797834, 33.49217615, 33.40637321, 33.32057303, 33.23477903, 33.14899458, 33.06322299, 32.9774675 , 32.89173129, 32.80601748, 32.72032915, 32.6346693 , 32.54904087, 32.46344678, 32.37788987, 32.29237292, 32.20689867, 32.12146983, 32.03608901, 31.95075883, 31.86548181, 31.78026045, 31.69509721, 31.60999449, 31.52495464, 31.43997998, 31.35507278, 31.27023528, 31.18546966, 31.10077807, 31.01616261, 30.93162537, 30.84716836, 30.76279358, 30.67850298, 30.59429848, 30.51018197, 30.42615529, 30.34222025, 30.25837863, 30.17463218, 30.09098261, 30.00743159, 29.92398077, 29.84063177, 29.75738617, 29.67424554, 29.59121138, 29.50828521, 29.42546848, 29.34276263, 29.26016908, 29.17768921, 29.09532437, 29.0130759 , 28.93094509, 28.84893323, 28.76704157, 28.68527134, 28.60362374, 28.52209996, 28.44070114, 28.35942842, 28.27828291, 28.1972657 , 28.11637786, 28.03562042, 27.95499442, 27.87450085, 27.79414069, 27.7139149 , 27.63382443, 27.55387019, 27.47405308, 27.39437399, 27.31483378, 27.23543329, 27.15617335, 27.07705477, 26.99807833, 26.91924481, 26.84055497, 26.76200955, 26.68360926, 26.60535481, 26.5272469 , 26.44928619, 26.37147335, 26.29380901, 26.21629381, 26.13892836, 26.06171325, 25.98464907, 25.90773639, 25.83097576, 25.75436773, 25.67791282, 25.60161154, 25.52546441, 25.4494719 , 25.37363449, 25.29795265, 25.22242682, 25.14705745, 25.07184496, 24.99678975, 24.92189225, 24.84715283, 24.77257187, 24.69814976, 24.62388683, 24.54978344, 24.47583993, 24.40205662, 24.32843382, 24.25497184, 24.18167098, 24.10853151, 24.03555372, 23.96273787, 23.89008421, 23.817593 , 23.74526447])

def pivot_milk_recordings_to_matrix(df: pandas.core.frame.DataFrame) -> numpy.ndarray: View Source

188def pivot_milk_recordings_to_matrix(df: pd.DataFrame) -> np.ndarray:
189    """Convert long-format recordings to a fixed 305-day matrix.
190
191    Rows represent lactations (``TestId``) and columns represent days in milk
192    from 1 through 305. Missing observations are kept as ``NaN``.
193
194    Args:
195        df: Dataframe with ``TestId``, ``DaysInMilk``, and ``MilkingYield``.
196
197    Returns:
198        A NumPy array of shape ``(n_lactations, 305)``.
199    """
200    # ensure sorting
201    df = df.sort_values(["TestId", "DaysInMilk"])
202
203    # pivot to wide format
204    milk_recordings_pivot = df.pivot_table(
205        index="TestId", columns="DaysInMilk", values="MilkingYield"
206    )
207
208    # enforce fixed 305-day grid alignment used by best-prediction
209    milk_recordings_pivot = milk_recordings_pivot.reindex(columns=range(1, 306))
210
211    # convert to numpy matrix
212    Y = milk_recordings_pivot.to_numpy()
213    return Y

Convert long-format recordings to a fixed 305-day matrix.

Rows represent lactations (TestId) and columns represent days in milk from 1 through 305. Missing observations are kept as NaN.

Arguments:

df: Dataframe with TestId, DaysInMilk, and MilkingYield.

Returns:

A NumPy array of shape (n_lactations, 305).

def fit_standard_lc(df: pandas.core.frame.DataFrame, lc_model: str = 'Wood') -> numpy.ndarray: View Source

216def fit_standard_lc(df: pd.DataFrame, lc_model: str = "Wood") -> np.ndarray:
217    """Fit a population-level standard lactation curve.
218
219    The curve is fit with the package's frequentist Wood model and returned on
220    the fixed day-1..305 grid.
221
222    Args:
223        df: Reference dataframe containing ``DaysInMilk`` and ``MilkingYield``.
224        lc_model: The lactation curve model to fit.
225        Default is the "Wood" lactation curve model.
226
227    Returns:
228        A NumPy array of expected daily milk yield for days 1..305.
229
230    Notes:
231        This mean curve acts as the baseline in best prediction. Individual
232        lactations are represented as deviations around this population profile.
233    """
234    standard_lc = pd.Series(
235        fit_lactation_curve(
236            df["DaysInMilk"].values,
237            df["MilkingYield"].values,
238            model=lc_model,
239            fitting="frequentist",
240        ),
241        index=range(1, 306),
242    )
243
244    return standard_lc.to_numpy(dtype=float)

Fit a population-level standard lactation curve.

The curve is fit with the package's frequentist Wood model and returned on the fixed day-1..305 grid.

Arguments:

df: Reference dataframe containing DaysInMilk and MilkingYield.
lc_model: The lactation curve model to fit.
Default is the "Wood" lactation curve model.

Returns:

A NumPy array of expected daily milk yield for days 1..305.

Notes:

This mean curve acts as the baseline in best prediction. Individual lactations are represented as deviations around this population profile.

def center_lactation_data( milk_matrix: numpy.ndarray, standard_lc: numpy.ndarray, day_mean_method: str = 'standard_lc') -> numpy.ndarray: View Source

247def center_lactation_data(
248    milk_matrix: np.ndarray,
249    standard_lc: np.ndarray,
250    day_mean_method: str = "standard_lc",
251) -> np.ndarray:
252    """Center lactation yields before covariance estimation.
253
254    Args:
255        milk_matrix: Yield matrix with lactations in rows and days in columns.
256        standard_lc: Expected day-wise milk yield profile.
257        day_mean_method: Mean-centering strategy. Supported values are
258            ``"standard_lc"`` (default) and ``"data"``.
259
260    Returns:
261        A centered matrix with the same shape as ``milk_matrix``.
262
263    Raises:
264        ValueError: If ``day_mean_method`` is not supported.
265    """
266    if day_mean_method == "standard_lc":
267        day_mean = standard_lc
268    elif day_mean_method == "data":
269        day_mean = np.nanmean(milk_matrix, axis=0)
270    else:
271        raise ValueError("day_mean_method must be 'standard_lc' or 'data'.")
272
273    return milk_matrix - day_mean

Center lactation yields before covariance estimation.

Arguments:

milk_matrix: Yield matrix with lactations in rows and days in columns.
standard_lc: Expected day-wise milk yield profile.
day_mean_method: Mean-centering strategy. Supported values are "standard_lc" (default) and "data".

Returns:

A centered matrix with the same shape as milk_matrix.

Raises:

ValueError: If day_mean_method is not supported.

def build_covariance_matrix(rho: float, size: int) -> numpy.ndarray: View Source

276def build_covariance_matrix(rho: float, size: int) -> np.ndarray:
277    """Construct a covariance matrix.
278
279    Cole et al. (2007) estimated correlations among test-day yields using a
280    simplified model with an identity matrix (I) for daily measurement error
281    and an autoregressive matrix (E) for biological change. E is defined as
282    ``Eij = r ** |i-j|`` where ``i`` and ``j`` are test-day DIM and
283    ``0 < r < 1``.
284
285    Element ``(i, j)`` is ``rho ** abs(i - j)``.
286
287    Args:
288        rho: AR(1) correlation parameter.
289        size: Matrix dimension.
290
291    Returns:
292        A ``(size, size)`` AR(1) correlation matrix.
293    """
294    idx = np.arange(size)
295    M = np.abs(idx[:, None] - idx[None, :])
296    return rho**M

Construct a covariance matrix.

Cole et al. (2007) estimated correlations among test-day yields using a simplified model with an identity matrix (I) for daily measurement error and an autoregressive matrix (E) for biological change. E is defined as Eij = r ** |i-j| where i and j are test-day DIM and 0 < r < 1.

Element (i, j) is rho ** abs(i - j).

Arguments:

rho: AR(1) correlation parameter.
size: Matrix dimension.

Returns:

A (size, size) AR(1) correlation matrix.

def fit_autocorrelation_matrix( df: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray) -> dict[str, numpy.ndarray | float]: View Source

299def fit_autocorrelation_matrix(
300    df: pd.DataFrame, standard_lc: np.ndarray
301) -> dict[str, np.ndarray | float]:
302    """Estimate covariance parameters for best prediction.
303
304    The model is ``B = b1 * I + b2 * E`` where ``E`` is an AR(1) correlation
305    matrix. Parameters are optimized in transformed space and mapped back to
306    enforce ``b1 > 0``, ``b2 > 0``, and ``0 < rho < 1``.
307
308    Args:
309        df: Reference milk-recording dataframe.
310        standard_lc: Population mean curve used for centering.
311
312    Returns:
313        Dictionary with:
314        - ``"B_hat"``: fitted covariance matrix.
315        - ``"R_hat"``: correlation matrix derived from ``B_hat``.
316        - ``"b1"``, ``"b2"``, ``"rho"``: fitted scalar parameters.
317    """
318    milk_matrix = pivot_milk_recordings_to_matrix(df)
319    centered_matrix = center_lactation_data(milk_matrix, standard_lc)
320    n_lactations, n_days = centered_matrix.shape
321    observed_indices = [np.where(~np.isnan(centered_matrix[i]))[0] for i in range(n_lactations)]
322
323    def negative_log_likelihood(params: np.ndarray) -> float:
324        p_b1, p_b2, p_rho = params
325        b1 = float(np.exp(p_b1))
326        b2 = float(np.exp(p_b2))
327        rho = float(1 / (1 + np.exp(-p_rho)))  # now rho in (0,1)
328        correlation_matrix = build_covariance_matrix(rho, n_days)
329
330        total = 0.0
331        for lactation_idx, day_indices in enumerate(observed_indices):
332            observation_count = len(day_indices)
333            if observation_count == 0:
334                continue
335
336            observations = centered_matrix[lactation_idx, day_indices]
337            correlation_subset = correlation_matrix[np.ix_(day_indices, day_indices)]
338            sigma = b1 * np.eye(observation_count) + b2 * correlation_subset
339
340            # Numerical safeguards: try Cholesky and penalize non-PD parameters.
341            try:
342                cholesky_factor, lower = cho_factor(sigma, check_finite=False)
343                solution = cho_solve((cholesky_factor, lower), observations, check_finite=False)
344            except LinAlgError:
345                # penalty for non-PD
346                return float(1e12 + np.sum(np.abs(params)))
347
348            quadratic_form = float(observations @ solution)
349            log_determinant = 2.0 * np.sum(np.log(np.diag(cholesky_factor)))
350            total += 0.5 * (
351                log_determinant + quadratic_form + observation_count * np.log(2 * np.pi)
352            )
353
354        # return total negative log-likelihood
355        return float(total)
356
357    # initial guesses and optimization. A 50/50 split in variance is assumed as starting point
358    initial_variance = max(float(np.nanvar(centered_matrix)), 1e-6)
359    initial_params = [
360        np.log(0.5 * initial_variance),
361        np.log(0.5 * initial_variance),
362        0.5,
363    ]
364
365    result = minimize(
366        negative_log_likelihood,
367        x0=initial_params,
368        method="L-BFGS-B",
369        options={"maxiter": 2000, "ftol": 1e-8},
370    )
371
372    if not result.success:
373        print(f"Optimization warning: {result.message}")
374
375    log_b1_hat, log_b2_hat, logit_rho_hat = result.x
376    b1_hat = float(np.exp(log_b1_hat))
377    b2_hat = float(np.exp(log_b2_hat))
378    rho_hat = float(1 / (1 + np.exp(-logit_rho_hat)))
379    correlation_matrix = build_covariance_matrix(rho_hat, n_days)
380    covariance_matrix = b1_hat * np.eye(n_days) + b2_hat * correlation_matrix
381
382    # convert to correlation matrix
383    std = np.sqrt(np.diag(covariance_matrix))
384    correlation_matrix = covariance_matrix / np.outer(std, std)
385
386    return {
387        "B_hat": covariance_matrix,
388        "R_hat": correlation_matrix,
389        "b1": b1_hat,
390        "b2": b2_hat,
391        "rho": rho_hat,
392    }

Estimate covariance parameters for best prediction.

The model is B = b1 * I + b2 * E where E is an AR(1) correlation matrix. Parameters are optimized in transformed space and mapped back to enforce b1 > 0, b2 > 0, and 0 < rho < 1.

Arguments:

df: Reference milk-recording dataframe.
standard_lc: Population mean curve used for centering.

Returns:

Dictionary with:

"B_hat": fitted covariance matrix.

"R_hat": correlation matrix derived from B_hat.

"b1", "b2", "rho": fitted scalar parameters.

def preprocess_measured_data( lactation: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray) -> pandas.core.series.Series: View Source

398def preprocess_measured_data(lactation: pd.DataFrame, standard_lc: np.ndarray) -> pd.Series:
399    """Build a 305-day deviation vector for a single lactation.
400
401    For observed days, this computes ``MilkingYield - standard_lc[day]``.
402    The result is reindexed to days 1..305 with unobserved days filled as zero.
403
404    Args:
405        lactation: Single-lactation dataframe with ``DaysInMilk`` and
406            ``MilkingYield``.
407        standard_lc: Expected daily milk yield profile.
408
409    Returns:
410        A Series indexed by day 1..305 containing milk-yield deviations.
411    """
412
413    # calculate the difference between the expected (population mean) and measured milk yield
414
415    # extract the expected milk yields for the measured DaysInMilk in the df
416    day_idx = lactation["DaysInMilk"].to_numpy(dtype=int) - 1
417    expected = np.asarray(standard_lc, dtype=float)[day_idx]
418
419    # Subtract
420    lactation["MilkDifference"] = lactation["MilkingYield"].to_numpy(dtype=float) - expected
421
422    # Create a Series of length 305 with missing values = 0
423    milk_difference = cast(pd.Series, lactation.set_index("DaysInMilk")["MilkDifference"])
424    corrected_series = milk_difference.reindex(range(1, 306), fill_value=0)
425
426    return corrected_series

Build a 305-day deviation vector for a single lactation.

For observed days, this computes MilkingYield - standard_lc[day]. The result is reindexed to days 1..305 with unobserved days filled as zero.

Arguments:

lactation: Single-lactation dataframe with DaysInMilk and MilkingYield.
standard_lc: Expected daily milk yield profile.

Returns:

A Series indexed by day 1..305 containing milk-yield deviations.

def best_predict_method_single_lac( lactation: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray = STANDARD_CURVE, covariance_matrix: numpy.ndarray = COV_MATRIX) -> float: View Source

429def best_predict_method_single_lac(
430    lactation: pd.DataFrame,
431    standard_lc: np.ndarray = STANDARD_CURVE,
432    covariance_matrix: np.ndarray = COV_MATRIX,
433) -> float:
434    """Predict 305-day cumulative yield for one lactation.
435
436    Observed test-day deviations are projected over all 305 days using the
437    covariance structure and then added to the baseline cumulative standard
438    curve.
439
440    By default this function uses the package-provided standard curve and covariance matrix.
441    But it is also possible to provide your own standard curve and covariance matrix,
442    for example when you want to fit these ingredients from your own reference population.
443
444
445    Args:
446        lactation: Observed records for one lactation.
447        standard_lc: Population mean daily yield profile.
448        covariance_matrix: Day-to-day covariance matrix on the 305-day grid.
449
450    Returns:
451        Predicted cumulative 305-day milk yield.
452
453    Notes:
454        Duplicate day records are resolved with ``keep="last"`` before
455        prediction. If no valid observations remain in days 1..305, the method
456        returns the cumulative standard curve.
457    """
458    filtered_lactation = lactation.loc[
459        (lactation["DaysInMilk"] >= 1) & (lactation["DaysInMilk"] <= 305)
460    ].copy()
461    filtered_lactation = filtered_lactation.drop_duplicates(subset=["DaysInMilk"], keep="last")
462    filtered_lactation = filtered_lactation.sort_values("DaysInMilk")
463
464    corrected_series = preprocess_measured_data(
465        filtered_lactation,
466        standard_lc=standard_lc,
467    )
468
469    if filtered_lactation.empty:
470        return float(np.sum(standard_lc))
471
472    obs_idx_1based = filtered_lactation["DaysInMilk"].to_numpy(dtype=int)  # DaysInMilk: 1-305
473    obs_idx_0based = obs_idx_1based - 1  # Convert to 0-based matrix indices: 0-304
474    y_obs = corrected_series.loc[obs_idx_1based].to_numpy(
475        dtype=float
476    )  # corrected_series is indexed by DaysInMilk (1-305)
477
478    # Extract covariance blocks
479    B_oo = covariance_matrix[
480        np.ix_(obs_idx_0based, obs_idx_0based)
481    ]  # Use 0-based indices for matrix
482    B_mo = covariance_matrix[:, obs_idx_0based]  # Use 0-based indices for matrix
483
484    # solve
485    c, lower = cho_factor(B_oo)
486    alpha = cho_solve((c, lower), y_obs)
487
488    # Predict full deviation curve
489    y_estimate = B_mo @ alpha
490
491    # Total milk = baseline + deviation
492    deviation = np.sum(y_estimate)
493
494    total = np.sum(standard_lc) + deviation
495
496    return total

Predict 305-day cumulative yield for one lactation.

Observed test-day deviations are projected over all 305 days using the covariance structure and then added to the baseline cumulative standard curve.

Arguments:

lactation: Observed records for one lactation.
standard_lc: Population mean daily yield profile.
covariance_matrix: Day-to-day covariance matrix on the 305-day grid.

Returns:

Predicted cumulative 305-day milk yield.

Notes:

Duplicate day records are resolved with keep="last" before prediction. If no valid observations remain in days 1..305, the method returns the cumulative standard curve.

def best_predict_method( df: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray = STANDARD_CURVE, days_in_milk_col: str | None = None, milking_yield_col: str | None = None, test_id_col: str | None = None, default_test_id: int = 0, covariance_matrix: numpy.ndarray | None = COV_MATRIX, fit_standard_lc_from_data: bool = False, reference_df: pandas.core.frame.DataFrame | None = None) -> pandas.core.frame.DataFrame: View Source

499def best_predict_method(
500    df: pd.DataFrame,
501    standard_lc: np.ndarray = STANDARD_CURVE,
502    days_in_milk_col: str | None = None,
503    milking_yield_col: str | None = None,
504    test_id_col: str | None = None,
505    default_test_id: int = 0,
506    covariance_matrix: np.ndarray | None = COV_MATRIX,
507    fit_standard_lc_from_data: bool = False,
508    reference_df: pd.DataFrame | None = None,
509) -> pd.DataFrame:
510    """Apply best prediction to one or more lactations.
511
512    By default this function uses the package-provided standard curve and covariance matrix.
513    But it is also possible to provide your own standard curve and covariance matrix,
514    for example when you want to fit these ingredients from your own reference population.
515    This can be done in two ways: either by fitting the covariance matrix and standard curve
516    directly from a reference dataset by providing a pandas dataframe at 'reference_df ='
517    when ``fit_standard_lc_from_data`` is True.
518    Alternative for customization is to set standard_lc_305 and covariance_matrix
519    directly in the function call.
520
521    Args:
522        df: Input observations. If ``TestId`` is missing, all rows are treated
523            as one lactation.
524        standard_lc: Expected daily milk yield lactation curve on days 1..305.
525            If not provided, the package's default curve is used.
526            Or fit your own standard curve from a reference dataset by
527            providing a pandas dataframe at 'reference_df ='
528            when ``fit_standard_lc_from_data`` is True.
529        days_in_milk_col: Optional input column name for days in milk. If
530            provided, it is mapped to ``DaysInMilk``.
531        milking_yield_col: Optional input column name for milk yield. If
532            provided, it is mapped to ``MilkingYield``.
533        test_id_col: Optional input column name for lactation/test identifier.
534            If provided, it is mapped to ``TestId``.
535        default_test_id: Fallback test id used when no test-id column is
536            available.
537        covariance_matrix: Optional prefit covariance matrix. If omitted,
538            the default matrix is used or
539             ``reference_df`` can be used to fit one for your own data.
540        fit_standard_lc_from_data: Whether to fit covariance information from
541            ``reference_df`` instead of using a provided covariance matrix.
542        reference_df: Reference dataframe used when ``covariance_matrix`` and
543            ``standard_lc`` are not provided and ``fit_standard_lc_from_data``
544            is True.
545
546    Returns:
547        Dataframe with columns ``TestId`` and ``LactationMilkYield``.
548
549    Raises:
550        ValueError: If neither ``covariance_matrix`` nor ``reference_df`` is
551            provided.
552    """
553    # Standardize columns and filter DIM <= 305
554    df = standardize_lactation_columns(
555        df,
556        days_in_milk_col=days_in_milk_col,
557        milking_yield_col=milking_yield_col,
558        test_id_col=test_id_col,
559        default_test_id=default_test_id,
560        max_dim=305,
561    )
562
563    # Fit covariance if not provided
564    if fit_standard_lc_from_data:
565        if reference_df is None:
566            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
567        reference_df = standardize_lactation_columns(
568            reference_df,
569            days_in_milk_col=days_in_milk_col,
570            milking_yield_col=milking_yield_col,
571            test_id_col=test_id_col,
572            default_test_id=default_test_id,
573            max_dim=305,
574        )
575        covariance_matrix = cast(
576            np.ndarray, fit_autocorrelation_matrix(reference_df, standard_lc)["B_hat"]
577        )
578
579    covariance_matrix_array = cast(np.ndarray, covariance_matrix)
580
581    df = df.copy()
582
583    results = []
584
585    for test_id, lactation in df.groupby("TestId"):
586        pred = best_predict_method_single_lac(
587            lactation,
588            standard_lc,
589            covariance_matrix_array,
590        )
591        results.append({"TestId": test_id, "LactationMilkYield": pred})
592
593    return pd.DataFrame(results)

Apply best prediction to one or more lactations.

By default this function uses the package-provided standard curve and covariance matrix. But it is also possible to provide your own standard curve and covariance matrix, for example when you want to fit these ingredients from your own reference population. This can be done in two ways: either by fitting the covariance matrix and standard curve directly from a reference dataset by providing a pandas dataframe at 'reference_df =' when fit_standard_lc_from_data is True. Alternative for customization is to set standard_lc_305 and covariance_matrix directly in the function call.

Arguments:

df: Input observations. If TestId is missing, all rows are treated as one lactation.
standard_lc: Expected daily milk yield lactation curve on days 1..305. If not provided, the package's default curve is used. Or fit your own standard curve from a reference dataset by providing a pandas dataframe at 'reference_df =' when fit_standard_lc_from_data is True.
days_in_milk_col: Optional input column name for days in milk. If provided, it is mapped to DaysInMilk.
milking_yield_col: Optional input column name for milk yield. If provided, it is mapped to MilkingYield.
test_id_col: Optional input column name for lactation/test identifier. If provided, it is mapped to TestId.
default_test_id: Fallback test id used when no test-id column is available.
covariance_matrix: Optional prefit covariance matrix. If omitted, the default matrix is used or reference_df can be used to fit one for your own data.
fit_standard_lc_from_data: Whether to fit covariance information from reference_df instead of using a provided covariance matrix.
reference_df: Reference dataframe used when covariance_matrix and standard_lc are not provided and fit_standard_lc_from_data is True.

Returns:

Dataframe with columns TestId and LactationMilkYield.

Raises:

ValueError: If neither covariance_matrix nor reference_df is provided.

def demo() -> None: View Source

599def demo() -> None:
600    """Run a minimal example of best prediction with mock data."""
601
602    # --- Single + multiple lactations example ---
603    test_df = pd.DataFrame(
604        {
605            "TestId": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
606            "DaysInMilk": [10, 20, 30, 40, 50, 15, 25, 35, 45, 55],
607            "MilkingYield": [30, 35, 40, 38, 36, 28, 33, 37, 39, 34],
608        }
609    )
610
611    result_cov = best_predict_method(
612        test_df, standard_lc=STANDARD_CURVE, covariance_matrix=COV_MATRIX
613    )
614
615    print("Predictions with provided covariance matrix:")
616    print(result_cov)

Run a minimal example of best prediction with mock data.