lactationcurve.characteristics.best_predict

Best prediction for cumulative 305-day milk yield.

Purpose

The Best Prediction (BP) method estimates 305-day lactation yield from intermittent test-day records by predicting deviations from an expected lactation curve at all unobserved days in milk. The method uses standard lactation curves that represent the expected course of lactation for specific cow subgroups, such as breed or parity.

By incorporating these standard curves, BP accounts for the typical lactation pattern in which milk yield increases after calving, reaches a peak, and subsequently declines. Missing daily yields are estimated from the covariance structure of test-day records and their deviations from the expected curve.

This module implements the best-prediction approach described by VanRaden (1997) for ICAR Procedure 2, Section 2 (Computing Accumulated Lactation Yield).

Method Summary

Adapted from the Best Predict Manual by Cole and VanRaden (2015)

Best prediction combines a population-level standard lactation curve with correlation-based corrections derived from observed test-day deviations. The method projects observed deviations onto the full 305-day curve using a covariance structure estimated from reference data.

Individual daily yield can be modeled as the expected value of a management group plus a deviation from that mean: yi = E(yi) + ti where yi is an individual yield on test day i, E(yi) is the expected yield for an animal in the same management group (Wiggans, Misztal, and Van Vleck 1988) on the same test day, and ti is a deviation from the group mean on the same test day. Suppose that µ is a vector of expected values for each day of lactation for a single trait, t is a vector of 305 test day deviations for the trait, and tm is a vector of only the measured deviations. The means and variances of t and tm are assumed known with V (t) = V and V (tm) = Vm. The covariance between t and tm, C, is also assumed known.

Lactation yield: A cow’s true 305-d yield (y) is the sum of the expected values for each day (1′µ) plus the sum of her 305 deviations from expectations (1′t), where 1′ is a vector of 1s of length 305. The cow’s true yield, and the best prediction of that yield (yˆ), are: y = 1′t yˆ = 1′CVm^−1 * tm

Key Entry Points

Column Flexibility

The functions accept several case-insensitive column name aliases and can create a default TestId if one is missing. Recognized aliases:

  • Days in Milk: ["daysinmilk", "dim", "testday"]
  • Milk Yield: ["milkingyield", "testdaymilkyield", "milkyield", "yield"]
  • Test Id: ["animalid", "testid", "id"]

It is also possible to provide your own column names so the function can be applied to dataframes with different column naming conventions.

Defaults

  • STANDARD_CURVE: Baseline expected lactation curve for days 1..305 (Wood).
  • COV_MATRIX: Default day-to-day covariance structure used for projection.

Notes

  • The default assets are loaded from the package data directory.
  • Users can fit curve and covariance ingredients from their own reference population.
  • The method can be applied to lactations without any measurements, in which case the result will be the population mean from the standard curve.
  • Predicted milk yields have less variance than true milk yields. With TIM, estimated yields have more variance than true yields. The reason is that predicted yields are regressed toward the mean unless all 305 daily yields are observed.
  • Currently it is not yet possible to predict lactation yields for lactation windows other than 305 days, but this is on the roadmap for future updates.
  • The method currently assumes that the standard curves and covariance structure is the same for all lactations, but future updates may allow using different standard curves and covariance structures for different subgroups of lactations (e.g., by breed or parity).
  • For the used standard lactation curve currently the Wood LC model is used, it is possible to use other methods aswell.
  • Strengths of Best Prediction includes its ability to leverage the full covariance structure of the lactation curve and to therefore potentially provide more accurate predictions especially for lactations with few test days. This is because the inherent shape of the lactation curve is taken into account in the projection of observed deviations to unobserved days. It has fewer in between steps then ISLC and is therefore easier to use.
  • Disadvantages of Best Prediction include its computational intensity, especially when fitting the covariance structure from data. And the method is not as easy to understand as a simpler method such as the test interval method, which can make it less transparent to users. The best results are obtained when the standard curve and covariance matrix are from the same population as the data, which can be a barrier for users without access to a large reference dataset. This also causes inconsistencies in cumulative milk yield results depending on which standard curves are used. A cow with the exact same test-day records can have a different cumulative milk yield estimates depending on the standard curve used, which can be considered unfair.

Still coming functionality

  • Milk, fat, and protein yields can be processed separately using single-trait best prediction or jointly using multi-trait best prediction. Replacement of the measured milk yields by the fat or protein yields gives the single-trait predictions for fat or for protein. Multi-trait predictions require larger vectors and matrices but similar algebra.

References

VanRaden, P. M. (1997). Lactation yields and accuracies computed from test day yields and (co) variances by best prediction. Journal of dairy science, 80(11), 3015-3022.

A Manual for Use of BESTPRED: A Program for Estimation of Lactation Yield and Persistency Using Best Prediction Release 2.0 rc 7 J. B. Cole and P. M. VanRaden August 12, 2009 Revised April 27, 2015 Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Room 306 Bldg 005 BARC-West, 10300 Baltimore Avenue, Beltsville, MD 20705-2350

Original code for best predict can be found on GitHub

Author: Meike van Leerdam, Date: 24-04-2026 Last update: 21-May-2026

  1"""Best prediction for cumulative 305-day milk yield.
  2
  3Purpose
  4-------
  5The Best Prediction (BP) method estimates 305-day lactation yield from
  6intermittent test-day records by predicting deviations from an expected
  7lactation curve at **all** unobserved days in milk. The method uses standard
  8lactation curves that represent the expected course of lactation for
  9specific cow subgroups, such as breed or parity.
 10
 11By incorporating these standard curves, BP accounts for the typical
 12lactation pattern in which milk yield increases after calving, reaches a
 13peak, and subsequently declines. Missing daily yields are estimated from
 14the covariance structure of test-day records and their deviations from the
 15expected curve.
 16
 17This module implements the best-prediction approach described by
 18VanRaden (1997) for ICAR Procedure 2, Section 2 (Computing Accumulated
 19Lactation Yield).
 20
 21Method Summary
 22--------------
 23Adapted from the Best Predict Manual by Cole and VanRaden (2015)
 24
 25Best prediction combines a population-level standard lactation curve with
 26correlation-based corrections derived from observed test-day deviations.
 27The method projects observed deviations onto the full 305-day curve using a
 28covariance structure estimated from reference data.
 29
 30Individual daily yield can be modeled as the expected value of a management
 31group plus a deviation from that mean:
 32yi = E(yi) + ti
 33where yi
 34is an individual yield on test day i, E(yi) is the expected yield for an
 35animal in the same management group
 36(Wiggans, Misztal, and Van Vleck 1988) on the same test day, and ti
 37is a deviation from the group mean on the same
 38test day. Suppose that µ is a vector of expected values for each day of
 39lactation for a single trait, t is a vector of 305
 40test day deviations for the trait, and tm is a vector of only the measured
 41deviations. The means and variances of t
 42and tm are assumed known with V (t) = V and V (tm) = Vm. The covariance
 43between t and tm, C, is also assumed
 44known.
 45
 46**Lactation yield**: A cow’s true 305-d yield (y) is the sum of the expected
 47values for each day (1′µ) plus the sum of her
 48305 deviations from expectations (1′t), where 1′
 49is a vector of 1s of length 305. The cow’s true yield, and the best
 50prediction of that yield (yˆ), are:
 51y = 1′t
 52yˆ = 1′CVm^−1 * tm
 53
 54
 55Key Entry Points
 56----------------
 57- ``best_predict_method``: Apply best prediction per ``TestId``.
 58- ``best_predict_method_single_lac``: Predict one lactation.
 59- ``fit_autocorrelation_matrix``: Fit covariance structure from reference data.
 60
 61Column Flexibility
 62------------------
 63The functions accept several case-insensitive column name aliases and can
 64create a default ``TestId`` if one is missing. Recognized aliases:
 65
 66- Days in Milk: `["daysinmilk", "dim", "testday"]`
 67- Milk Yield: `["milkingyield", "testdaymilkyield", "milkyield", "yield"]`
 68- Test Id: `["animalid", "testid", "id"]`
 69
 70It is also possible to provide your own column names so the function
 71can be applied to dataframes with different column naming conventions.
 72
 73Defaults
 74--------
 75- ``STANDARD_CURVE``: Baseline expected lactation curve for days 1..305 (Wood).
 76- ``COV_MATRIX``: Default day-to-day covariance structure used for projection.
 77
 78Notes
 79-----
 80- The default assets are loaded from the package ``data`` directory.
 81- Users can fit curve and covariance ingredients from their own reference
 82    population.
 83- The method can be applied to lactations without any measurements,
 84    in which case the result will be the population mean from the standard curve.
 85- Predicted milk yields have less variance than true milk yields. With TIM,
 86estimated yields have more variance than true yields. The reason is that predicted yields are
 87regressed toward the mean unless all 305 daily yields are observed.
 88- Currently it is not yet possible to predict lactation yields for lactation windows
 89    other than 305 days, but this is on the roadmap for future updates.
 90- The method currently assumes that the standard curves and covariance structure is the same
 91    for all lactations,
 92    but future updates may allow using different standard curves and covariance structures for
 93    different subgroups of lactations (e.g., by breed or parity).
 94- For the used standard lactation curve currently the Wood LC model is used, it is possible to
 95    use other methods aswell.
 96- Strengths of Best Prediction includes its ability to leverage the full covariance structure
 97    of the lactation curve and to therefore potentially provide more accurate predictions
 98    especially for lactations with few test days. This is because the inherent shape of the
 99    lactation curve is taken into account in the projection of
100    observed deviations to unobserved days.
101    It has fewer in between steps then ISLC and is therefore easier to use.
102- Disadvantages of Best Prediction include its computational intensity,
103    especially when fitting the covariance structure from data.
104    And the method is not as easy to understand
105    as a simpler method such as the test interval method,
106    which can make it less transparent to users.
107    The best results are obtained when the standard curve and covariance matrix
108    are from the same population as the data,
109    which can be a barrier for users without access to a large reference dataset.
110    This also causes inconsistencies in cumulative milk yield results
111    depending on which standard curves are used.
112    A cow with the exact same test-day records
113    can have a different cumulative milk yield estimates depending
114    on the standard curve used, which can be considered unfair.
115
116Still coming functionality
117--------------------------
118- Milk, fat, and protein yields can be processed separately using single-trait best
119    prediction or jointly using multi-trait best prediction.
120    Replacement of the measured milk yields by the fat or protein yields gives the
121    single-trait predictions for fat or for protein.
122    Multi-trait predictions require larger vectors and matrices but similar algebra.
123
124
125
126References
127---------
128VanRaden, P. M. (1997). Lactation yields and accuracies computed from test
129day yields and (co) variances by best prediction.
130Journal of dairy science, 80(11), 3015-3022.
131
132A Manual for Use of BESTPRED: A Program for Estimation of Lactation Yield
133and Persistency Using Best Prediction
134Release 2.0 rc 7
135J. B. Cole and P. M. VanRaden
136August 12, 2009
137Revised April 27, 2015
138Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States
139Department of Agriculture, Room 306 Bldg 005 BARC-West, 10300 Baltimore Avenue,
140Beltsville, MD 20705-2350
141
142Original code for best predict can be found on [GitHub](https://github.com/wintermind/bestpred)
143
144Author: Meike van Leerdam,
145Date: 24-04-2026
146Last update: 21-May-2026
147"""
148
149import inspect
150from pathlib import Path
151from typing import cast
152
153import numpy as np
154import pandas as pd
155from scipy.linalg import LinAlgError, cho_factor, cho_solve
156from scipy.optimize import minimize
157
158from lactationcurve.fitting import fit_lactation_curve
159from lactationcurve.preprocessing import standardize_lactation_columns
160
161# get the standard lactation curve ingredients back from the data storage:
162DATA_DIR = Path(__file__).resolve().parent / "data"
163COV_MATRIX = np.load(DATA_DIR / "covariance_matrix_best_predict.npy")
164STANDARD_CURVE = np.load(DATA_DIR / "standard_lc_wood.npy")
165
166
167# Lightweight repr-only wrapper for cleaner generated signatures in docs
168class _DocDefault:
169    def __init__(self, label: str) -> None:
170        self.label = label
171
172    def __repr__(self) -> str:  # pragma: no cover - docs-only
173        return self.label
174
175
176_DOC_STANDARD_CURVE = _DocDefault("STANDARD_CURVE")
177_DOC_COV_MATRIX = _DocDefault("COV_MATRIX")
178
179# functions to fit you own standard curve and covariance matrix
180
181
182def pivot_milk_recordings_to_matrix(df: pd.DataFrame) -> np.ndarray:
183    """Convert long-format recordings to a fixed 305-day matrix.
184
185    Rows represent lactations (``TestId``) and columns represent days in milk
186    from 1 through 305. Missing observations are kept as ``NaN``.
187
188    Args:
189        df: Dataframe with ``TestId``, ``DaysInMilk``, and ``MilkingYield``.
190
191    Returns:
192        A NumPy array of shape ``(n_lactations, 305)``.
193    """
194    # ensure sorting
195    df = df.sort_values(["TestId", "DaysInMilk"])
196
197    # pivot to wide format
198    milk_recordings_pivot = df.pivot_table(
199        index="TestId", columns="DaysInMilk", values="MilkingYield"
200    )
201
202    # enforce fixed 305-day grid alignment used by best-prediction
203    milk_recordings_pivot = milk_recordings_pivot.reindex(columns=range(1, 306))
204
205    # convert to numpy matrix
206    Y = milk_recordings_pivot.to_numpy()
207    return Y
208
209
210def fit_standard_lc(df: pd.DataFrame, lc_model: str = "Wood") -> np.ndarray:
211    """Fit a population-level standard lactation curve.
212
213    The curve is fit with the package's frequentist Wood model and returned on
214    the fixed day-1..305 grid.
215
216    Args:
217        df: Reference dataframe containing ``DaysInMilk`` and ``MilkingYield``.
218        lc_model: The lactation curve model to fit.
219        Default is the "Wood" lactation curve model.
220
221    Returns:
222        A NumPy array of expected daily milk yield for days 1..305.
223
224    Notes:
225        This mean curve acts as the baseline in best prediction. Individual
226        lactations are represented as deviations around this population profile.
227    """
228    standard_lc = pd.Series(
229        fit_lactation_curve(
230            df["DaysInMilk"].values,
231            df["MilkingYield"].values,
232            model=lc_model,
233            fitting="frequentist",
234        ),
235        index=range(1, 306),
236    )
237
238    return standard_lc.to_numpy(dtype=float)
239
240
241def center_lactation_data(
242    milk_matrix: np.ndarray,
243    standard_lc: np.ndarray,
244    day_mean_method: str = "standard_lc",
245) -> np.ndarray:
246    """Center lactation yields before covariance estimation.
247
248    Args:
249        milk_matrix: Yield matrix with lactations in rows and days in columns.
250        standard_lc: Expected day-wise milk yield profile.
251        day_mean_method: Mean-centering strategy. Supported values are
252            ``"standard_lc"`` (default) and ``"data"``.
253
254    Returns:
255        A centered matrix with the same shape as ``milk_matrix``.
256
257    Raises:
258        ValueError: If ``day_mean_method`` is not supported.
259    """
260    if day_mean_method == "standard_lc":
261        day_mean = standard_lc
262    elif day_mean_method == "data":
263        day_mean = np.nanmean(milk_matrix, axis=0)
264    else:
265        raise ValueError("day_mean_method must be 'standard_lc' or 'data'.")
266
267    return milk_matrix - day_mean
268
269
270def build_covariance_matrix(rho: float, size: int) -> np.ndarray:
271    """Construct a covariance matrix.
272
273    Cole et al. (2007) estimated correlations among test-day yields using a
274    simplified model with an identity matrix (I) for daily measurement error
275    and an autoregressive matrix (E) for biological change. E is defined as
276    ``Eij = r ** |i-j|`` where ``i`` and ``j`` are test-day DIM and
277    ``0 < r < 1``.
278
279    Element ``(i, j)`` is ``rho ** abs(i - j)``.
280
281    Args:
282        rho: AR(1) correlation parameter.
283        size: Matrix dimension.
284
285    Returns:
286        A ``(size, size)`` AR(1) correlation matrix.
287    """
288    idx = np.arange(size)
289    M = np.abs(idx[:, None] - idx[None, :])
290    return rho**M
291
292
293def fit_autocorrelation_matrix(
294    df: pd.DataFrame, standard_lc: np.ndarray
295) -> dict[str, np.ndarray | float]:
296    """Estimate covariance parameters for best prediction.
297
298    The model is ``B = b1 * I + b2 * E`` where ``E`` is an AR(1) correlation
299    matrix. Parameters are optimized in transformed space and mapped back to
300    enforce ``b1 > 0``, ``b2 > 0``, and ``0 < rho < 1``.
301
302    Args:
303        df: Reference milk-recording dataframe.
304        standard_lc: Population mean curve used for centering.
305
306    Returns:
307        Dictionary with:
308        - ``"B_hat"``: fitted covariance matrix.
309        - ``"R_hat"``: correlation matrix derived from ``B_hat``.
310        - ``"b1"``, ``"b2"``, ``"rho"``: fitted scalar parameters.
311    """
312    milk_matrix = pivot_milk_recordings_to_matrix(df)
313    centered_matrix = center_lactation_data(milk_matrix, standard_lc)
314    n_lactations, n_days = centered_matrix.shape
315    observed_indices = [np.where(~np.isnan(centered_matrix[i]))[0] for i in range(n_lactations)]
316
317    def negative_log_likelihood(params: np.ndarray) -> float:
318        p_b1, p_b2, p_rho = params
319        b1 = float(np.exp(p_b1))
320        b2 = float(np.exp(p_b2))
321        rho = float(1 / (1 + np.exp(-p_rho)))  # now rho in (0,1)
322        correlation_matrix = build_covariance_matrix(rho, n_days)
323
324        total = 0.0
325        for lactation_idx, day_indices in enumerate(observed_indices):
326            observation_count = len(day_indices)
327            if observation_count == 0:
328                continue
329
330            observations = centered_matrix[lactation_idx, day_indices]
331            correlation_subset = correlation_matrix[np.ix_(day_indices, day_indices)]
332            sigma = b1 * np.eye(observation_count) + b2 * correlation_subset
333
334            # Numerical safeguards: try Cholesky and penalize non-PD parameters.
335            try:
336                cholesky_factor, lower = cho_factor(sigma, check_finite=False)
337                solution = cho_solve((cholesky_factor, lower), observations, check_finite=False)
338            except LinAlgError:
339                # penalty for non-PD
340                return float(1e12 + np.sum(np.abs(params)))
341
342            quadratic_form = float(observations @ solution)
343            log_determinant = 2.0 * np.sum(np.log(np.diag(cholesky_factor)))
344            total += 0.5 * (
345                log_determinant + quadratic_form + observation_count * np.log(2 * np.pi)
346            )
347
348        # return total negative log-likelihood
349        return float(total)
350
351    # initial guesses and optimization. A 50/50 split in variance is assumed as starting point
352    initial_variance = max(float(np.nanvar(centered_matrix)), 1e-6)
353    initial_params = [
354        np.log(0.5 * initial_variance),
355        np.log(0.5 * initial_variance),
356        0.5,
357    ]
358
359    result = minimize(
360        negative_log_likelihood,
361        x0=initial_params,
362        method="L-BFGS-B",
363        options={"maxiter": 2000, "ftol": 1e-8},
364    )
365
366    if not result.success:
367        print(f"Optimization warning: {result.message}")
368
369    log_b1_hat, log_b2_hat, logit_rho_hat = result.x
370    b1_hat = float(np.exp(log_b1_hat))
371    b2_hat = float(np.exp(log_b2_hat))
372    rho_hat = float(1 / (1 + np.exp(-logit_rho_hat)))
373    correlation_matrix = build_covariance_matrix(rho_hat, n_days)
374    covariance_matrix = b1_hat * np.eye(n_days) + b2_hat * correlation_matrix
375
376    # convert to correlation matrix
377    std = np.sqrt(np.diag(covariance_matrix))
378    correlation_matrix = covariance_matrix / np.outer(std, std)
379
380    return {
381        "B_hat": covariance_matrix,
382        "R_hat": correlation_matrix,
383        "b1": b1_hat,
384        "b2": b2_hat,
385        "rho": rho_hat,
386    }
387
388
389# Functions for best predict that also work with the provided standard curve and covariance matrix.
390
391
392def preprocess_measured_data(lactation: pd.DataFrame, standard_lc: np.ndarray) -> pd.Series:
393    """Build a 305-day deviation vector for a single lactation.
394
395    For observed days, this computes ``MilkingYield - standard_lc[day]``.
396    The result is reindexed to days 1..305 with unobserved days filled as zero.
397
398    Args:
399        lactation: Single-lactation dataframe with ``DaysInMilk`` and
400            ``MilkingYield``.
401        standard_lc: Expected daily milk yield profile.
402
403    Returns:
404        A Series indexed by day 1..305 containing milk-yield deviations.
405    """
406
407    # calculate the difference between the expected (population mean) and measured milk yield
408
409    # extract the expected milk yields for the measured DaysInMilk in the df
410    day_idx = lactation["DaysInMilk"].to_numpy(dtype=int) - 1
411    expected = np.asarray(standard_lc, dtype=float)[day_idx]
412
413    # Subtract
414    lactation["MilkDifference"] = lactation["MilkingYield"].to_numpy(dtype=float) - expected
415
416    # Create a Series of length 305 with missing values = 0
417    milk_difference = cast(pd.Series, lactation.set_index("DaysInMilk")["MilkDifference"])
418    corrected_series = milk_difference.reindex(range(1, 306), fill_value=0)
419
420    return corrected_series
421
422
423def best_predict_method_single_lac(
424    lactation: pd.DataFrame,
425    standard_lc: np.ndarray = STANDARD_CURVE,
426    covariance_matrix: np.ndarray = COV_MATRIX,
427) -> float:
428    """Predict 305-day cumulative yield for one lactation.
429
430    Observed test-day deviations are projected over all 305 days using the
431    covariance structure and then added to the baseline cumulative standard
432    curve.
433
434    By default this function uses the package-provided standard curve and covariance matrix.
435    But it is also possible to provide your own standard curve and covariance matrix,
436    for example when you want to fit these ingredients from your own reference population.
437
438
439    Args:
440        lactation: Observed records for one lactation.
441        standard_lc: Population mean daily yield profile.
442        covariance_matrix: Day-to-day covariance matrix on the 305-day grid.
443
444    Returns:
445        Predicted cumulative 305-day milk yield.
446
447    Notes:
448        Duplicate day records are resolved with ``keep="last"`` before
449        prediction. If no valid observations remain in days 1..305, the method
450        returns the cumulative standard curve.
451    """
452    filtered_lactation = lactation.loc[
453        (lactation["DaysInMilk"] >= 1) & (lactation["DaysInMilk"] <= 305)
454    ].copy()
455    filtered_lactation = filtered_lactation.drop_duplicates(subset=["DaysInMilk"], keep="last")
456    filtered_lactation = filtered_lactation.sort_values("DaysInMilk")
457
458    corrected_series = preprocess_measured_data(
459        filtered_lactation,
460        standard_lc=standard_lc,
461    )
462
463    if filtered_lactation.empty:
464        return float(np.sum(standard_lc))
465
466    obs_idx_1based = filtered_lactation["DaysInMilk"].to_numpy(dtype=int)  # DaysInMilk: 1-305
467    obs_idx_0based = obs_idx_1based - 1  # Convert to 0-based matrix indices: 0-304
468    y_obs = corrected_series.loc[obs_idx_1based].to_numpy(
469        dtype=float
470    )  # corrected_series is indexed by DaysInMilk (1-305)
471
472    # Extract covariance blocks
473    B_oo = covariance_matrix[
474        np.ix_(obs_idx_0based, obs_idx_0based)
475    ]  # Use 0-based indices for matrix
476    B_mo = covariance_matrix[:, obs_idx_0based]  # Use 0-based indices for matrix
477
478    # solve
479    c, lower = cho_factor(B_oo)
480    alpha = cho_solve((c, lower), y_obs)
481
482    # Predict full deviation curve
483    y_estimate = B_mo @ alpha
484
485    # Total milk = baseline + deviation
486    deviation = np.sum(y_estimate)
487
488    total = np.sum(standard_lc) + deviation
489
490    return total
491
492
493def best_predict_method(
494    df: pd.DataFrame,
495    standard_lc: np.ndarray = STANDARD_CURVE,
496    days_in_milk_col: str | None = None,
497    milking_yield_col: str | None = None,
498    test_id_col: str | None = None,
499    default_test_id: int = 0,
500    covariance_matrix: np.ndarray | None = COV_MATRIX,
501    fit_standard_lc_from_data: bool = False,
502    reference_df: pd.DataFrame | None = None,
503) -> pd.DataFrame:
504    """Apply best prediction to one or more lactations.
505
506    By default this function uses the package-provided standard curve and covariance matrix.
507    But it is also possible to provide your own standard curve and covariance matrix,
508    for example when you want to fit these ingredients from your own reference population.
509    This can be done in two ways: either by fitting the covariance matrix and standard curve
510    directly from a reference dataset by providing a pandas dataframe at 'reference_df ='
511    when ``fit_standard_lc_from_data`` is True.
512    Alternative for customization is to set standard_lc_305 and covariance_matrix
513    directly in the function call.
514
515    Args:
516        df: Input observations. If ``TestId`` is missing, all rows are treated
517            as one lactation.
518        standard_lc: Expected daily milk yield lactation curve on days 1..305.
519            If not provided, the package's default curve is used.
520            Or fit your own standard curve from a reference dataset by
521            providing a pandas dataframe at 'reference_df ='
522            when ``fit_standard_lc_from_data`` is True.
523        days_in_milk_col: Optional input column name for days in milk. If
524            provided, it is mapped to ``DaysInMilk``.
525        milking_yield_col: Optional input column name for milk yield. If
526            provided, it is mapped to ``MilkingYield``.
527        test_id_col: Optional input column name for lactation/test identifier.
528            If provided, it is mapped to ``TestId``.
529        default_test_id: Fallback test id used when no test-id column is
530            available.
531        covariance_matrix: Optional prefit covariance matrix. If omitted,
532            the default matrix is used or
533             ``reference_df`` can be used to fit one for your own data.
534        fit_standard_lc_from_data: Whether to fit covariance information from
535            ``reference_df`` instead of using a provided covariance matrix.
536        reference_df: Reference dataframe used when ``covariance_matrix`` and
537            ``standard_lc`` are not provided and ``fit_standard_lc_from_data``
538            is True.
539
540    Returns:
541        Dataframe with columns ``TestId`` and ``LactationMilkYield``.
542
543    Raises:
544        ValueError: If neither ``covariance_matrix`` nor ``reference_df`` is
545            provided.
546    """
547    # Standardize columns and filter DIM <= 305
548    df = standardize_lactation_columns(
549        df,
550        days_in_milk_col=days_in_milk_col,
551        milking_yield_col=milking_yield_col,
552        test_id_col=test_id_col,
553        default_test_id=default_test_id,
554        max_dim=305,
555    )
556
557    # Fit covariance if not provided
558    if fit_standard_lc_from_data:
559        if reference_df is None:
560            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
561        reference_df = standardize_lactation_columns(
562            reference_df,
563            days_in_milk_col=days_in_milk_col,
564            milking_yield_col=milking_yield_col,
565            test_id_col=test_id_col,
566            default_test_id=default_test_id,
567            max_dim=305,
568        )
569        covariance_matrix = cast(
570            np.ndarray, fit_autocorrelation_matrix(reference_df, standard_lc)["B_hat"]
571        )
572
573    covariance_matrix_array = cast(np.ndarray, covariance_matrix)
574
575    df = df.copy()
576
577    results = []
578
579    for test_id, lactation in df.groupby("TestId"):
580        pred = best_predict_method_single_lac(
581            lactation,
582            standard_lc,
583            covariance_matrix_array,
584        )
585        results.append({"TestId": test_id, "LactationMilkYield": pred})
586
587    return pd.DataFrame(results)
588
589
590# demo function so I can see if this script runs as expected
591
592
593def demo() -> None:
594    """Run a minimal example of best prediction with mock data."""
595
596    # --- Single + multiple lactations example ---
597    test_df = pd.DataFrame(
598        {
599            "TestId": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
600            "DaysInMilk": [10, 20, 30, 40, 50, 15, 25, 35, 45, 55],
601            "MilkingYield": [30, 35, 40, 38, 36, 28, 33, 37, 39, 34],
602        }
603    )
604
605    result_cov = best_predict_method(
606        test_df, standard_lc=STANDARD_CURVE, covariance_matrix=COV_MATRIX
607    )
608
609    print("Predictions with provided covariance matrix:")
610    print(result_cov)
611
612
613def _set_doc_signatures() -> None:
614    """Override displayed defaults in docs without changing runtime behavior."""
615    doc_defaults = {
616        "standard_lc": _DOC_STANDARD_CURVE,
617        "covariance_matrix": _DOC_COV_MATRIX,
618    }
619
620    for func in (best_predict_method_single_lac, best_predict_method):
621        signature = inspect.signature(func)
622        params = [
623            param.replace(default=doc_defaults[param.name]) if param.name in doc_defaults else param
624            for param in signature.parameters.values()
625        ]
626        func.__signature__ = signature.replace(parameters=params)
627
628
629_set_doc_signatures()
630
631if __name__ == "__main__":
632    demo()
DATA_DIR = PosixPath('/home/runner/work/bovi/bovi/packages/models/lactationcurve/src/lactationcurve/characteristics/data')
COV_MATRIX = array([[75.16786 , 67.28662 , 67.05099 , ..., 23.40674 , 23.32477 , 23.243088], [67.28662 , 75.16786 , 67.28662 , ..., 23.488997, 23.40674 , 23.32477 ], [67.05099 , 67.28662 , 75.16786 , ..., 23.571543, 23.488997, 23.40674 ], ..., [23.40674 , 23.488997, 23.571543, ..., 75.16786 , 67.28662 , 67.05099 ], [23.32477 , 23.40674 , 23.488997, ..., 67.28662 , 75.16786 , 67.28662 ], [23.243088, 23.32477 , 23.40674 , ..., 67.05099 , 67.28662 , 75.16786 ]], shape=(305, 305), dtype=float32)
STANDARD_CURVE = array([20.88852108, 24.28756033, 26.48529607, 28.13333695, 29.45700725, 30.56351807, 31.51304545, 32.34304569, 33.0785599 , 33.7372317 , 34.33200401, 34.8726797 , 35.36687887, 35.82065388, 36.23889995, 36.62563841, 36.98421725, 37.31745679, 37.62775749, 37.91718122, 38.18751364, 38.44031259, 38.67694623, 38.89862343, 39.10641824, 39.3012897 , 39.4840982 , 39.6556189 , 39.81655295, 39.9675369 , 40.10915064, 40.24192411, 40.36634304, 40.48285392, 40.59186822, 40.69376607, 40.78889947, 40.87759509, 40.96015672, 41.03686741, 41.1079914 , 41.17377578, 41.23445203, 41.29023728, 41.34133561, 41.387939 , 41.4302284 , 41.46837449, 41.50253857, 41.53287316, 41.55952271, 41.58262416, 41.60230747, 41.61869609, 41.63190741, 41.64205315, 41.64923978, 41.65356877, 41.65513699, 41.65403694, 41.65035703, 41.64418185, 41.63559233, 41.62466604, 41.61147731, 41.59609742, 41.57859483, 41.55903524, 41.53748181, 41.51399526, 41.48863401, 41.46145429, 41.43251026, 41.40185408, 41.36953608, 41.33560475, 41.30010691, 41.26308776, 41.22459095, 41.18465864, 41.1433316 , 41.10064924, 41.0566497 , 41.01136988, 40.9648455 , 40.91711116, 40.86820038, 40.81814563, 40.76697841, 40.71472924, 40.66142775, 40.60710269, 40.55178196, 40.49549263, 40.43826102, 40.38011269, 40.32107247, 40.2611645 , 40.20041226, 40.13883855, 40.0764656 , 40.01331499, 39.94940774, 39.88476433, 39.81940467, 39.75334817, 39.68661372, 39.61921973, 39.55118415, 39.48252446, 39.41325771, 39.34340051, 39.27296908, 39.20197922, 39.13044635, 39.05838551, 38.9858114 , 38.91273835, 38.83918033, 38.76515102, 38.69066375, 38.61573155, 38.54036713, 38.46458294, 38.38839112, 38.31180353, 38.23483177, 38.15748719, 38.07978086, 38.00172363, 37.92332608, 37.84459857, 37.76555126, 37.68619404, 37.60653661, 37.52658847, 37.4463589 , 37.36585698, 37.28509161, 37.20407148, 37.12280513, 37.04130088, 36.95956691, 36.8776112 , 36.7954416 , 36.71306576, 36.6304912 , 36.54772528, 36.46477519, 36.381648 , 36.29835062, 36.21488984, 36.13127228, 36.04750445, 35.96359274, 35.87954339, 35.79536253, 35.71105615, 35.62663015, 35.54209029, 35.45744224, 35.37269153, 35.2878436 , 35.20290379, 35.11787732, 35.03276932, 34.94758481, 34.86232874, 34.77700593, 34.69162112, 34.60617898, 34.52068406, 34.43514083, 34.3495537 , 34.26392697, 34.17826487, 34.09257154, 34.00685105, 33.9211074 , 33.83534449, 33.74956618, 33.66377623, 33.57797834, 33.49217615, 33.40637321, 33.32057303, 33.23477903, 33.14899458, 33.06322299, 32.9774675 , 32.89173129, 32.80601748, 32.72032915, 32.6346693 , 32.54904087, 32.46344678, 32.37788987, 32.29237292, 32.20689867, 32.12146983, 32.03608901, 31.95075883, 31.86548181, 31.78026045, 31.69509721, 31.60999449, 31.52495464, 31.43997998, 31.35507278, 31.27023528, 31.18546966, 31.10077807, 31.01616261, 30.93162537, 30.84716836, 30.76279358, 30.67850298, 30.59429848, 30.51018197, 30.42615529, 30.34222025, 30.25837863, 30.17463218, 30.09098261, 30.00743159, 29.92398077, 29.84063177, 29.75738617, 29.67424554, 29.59121138, 29.50828521, 29.42546848, 29.34276263, 29.26016908, 29.17768921, 29.09532437, 29.0130759 , 28.93094509, 28.84893323, 28.76704157, 28.68527134, 28.60362374, 28.52209996, 28.44070114, 28.35942842, 28.27828291, 28.1972657 , 28.11637786, 28.03562042, 27.95499442, 27.87450085, 27.79414069, 27.7139149 , 27.63382443, 27.55387019, 27.47405308, 27.39437399, 27.31483378, 27.23543329, 27.15617335, 27.07705477, 26.99807833, 26.91924481, 26.84055497, 26.76200955, 26.68360926, 26.60535481, 26.5272469 , 26.44928619, 26.37147335, 26.29380901, 26.21629381, 26.13892836, 26.06171325, 25.98464907, 25.90773639, 25.83097576, 25.75436773, 25.67791282, 25.60161154, 25.52546441, 25.4494719 , 25.37363449, 25.29795265, 25.22242682, 25.14705745, 25.07184496, 24.99678975, 24.92189225, 24.84715283, 24.77257187, 24.69814976, 24.62388683, 24.54978344, 24.47583993, 24.40205662, 24.32843382, 24.25497184, 24.18167098, 24.10853151, 24.03555372, 23.96273787, 23.89008421, 23.817593 , 23.74526447])
def pivot_milk_recordings_to_matrix(df: pandas.core.frame.DataFrame) -> numpy.ndarray:
183def pivot_milk_recordings_to_matrix(df: pd.DataFrame) -> np.ndarray:
184    """Convert long-format recordings to a fixed 305-day matrix.
185
186    Rows represent lactations (``TestId``) and columns represent days in milk
187    from 1 through 305. Missing observations are kept as ``NaN``.
188
189    Args:
190        df: Dataframe with ``TestId``, ``DaysInMilk``, and ``MilkingYield``.
191
192    Returns:
193        A NumPy array of shape ``(n_lactations, 305)``.
194    """
195    # ensure sorting
196    df = df.sort_values(["TestId", "DaysInMilk"])
197
198    # pivot to wide format
199    milk_recordings_pivot = df.pivot_table(
200        index="TestId", columns="DaysInMilk", values="MilkingYield"
201    )
202
203    # enforce fixed 305-day grid alignment used by best-prediction
204    milk_recordings_pivot = milk_recordings_pivot.reindex(columns=range(1, 306))
205
206    # convert to numpy matrix
207    Y = milk_recordings_pivot.to_numpy()
208    return Y

Convert long-format recordings to a fixed 305-day matrix.

Rows represent lactations (TestId) and columns represent days in milk from 1 through 305. Missing observations are kept as NaN.

Arguments:
  • df: Dataframe with TestId, DaysInMilk, and MilkingYield.
Returns:

A NumPy array of shape (n_lactations, 305).

def fit_standard_lc(df: pandas.core.frame.DataFrame, lc_model: str = 'Wood') -> numpy.ndarray:
211def fit_standard_lc(df: pd.DataFrame, lc_model: str = "Wood") -> np.ndarray:
212    """Fit a population-level standard lactation curve.
213
214    The curve is fit with the package's frequentist Wood model and returned on
215    the fixed day-1..305 grid.
216
217    Args:
218        df: Reference dataframe containing ``DaysInMilk`` and ``MilkingYield``.
219        lc_model: The lactation curve model to fit.
220        Default is the "Wood" lactation curve model.
221
222    Returns:
223        A NumPy array of expected daily milk yield for days 1..305.
224
225    Notes:
226        This mean curve acts as the baseline in best prediction. Individual
227        lactations are represented as deviations around this population profile.
228    """
229    standard_lc = pd.Series(
230        fit_lactation_curve(
231            df["DaysInMilk"].values,
232            df["MilkingYield"].values,
233            model=lc_model,
234            fitting="frequentist",
235        ),
236        index=range(1, 306),
237    )
238
239    return standard_lc.to_numpy(dtype=float)

Fit a population-level standard lactation curve.

The curve is fit with the package's frequentist Wood model and returned on the fixed day-1..305 grid.

Arguments:
  • df: Reference dataframe containing DaysInMilk and MilkingYield.
  • lc_model: The lactation curve model to fit.
  • Default is the "Wood" lactation curve model.
Returns:

A NumPy array of expected daily milk yield for days 1..305.

Notes:

This mean curve acts as the baseline in best prediction. Individual lactations are represented as deviations around this population profile.

def center_lactation_data( milk_matrix: numpy.ndarray, standard_lc: numpy.ndarray, day_mean_method: str = 'standard_lc') -> numpy.ndarray:
242def center_lactation_data(
243    milk_matrix: np.ndarray,
244    standard_lc: np.ndarray,
245    day_mean_method: str = "standard_lc",
246) -> np.ndarray:
247    """Center lactation yields before covariance estimation.
248
249    Args:
250        milk_matrix: Yield matrix with lactations in rows and days in columns.
251        standard_lc: Expected day-wise milk yield profile.
252        day_mean_method: Mean-centering strategy. Supported values are
253            ``"standard_lc"`` (default) and ``"data"``.
254
255    Returns:
256        A centered matrix with the same shape as ``milk_matrix``.
257
258    Raises:
259        ValueError: If ``day_mean_method`` is not supported.
260    """
261    if day_mean_method == "standard_lc":
262        day_mean = standard_lc
263    elif day_mean_method == "data":
264        day_mean = np.nanmean(milk_matrix, axis=0)
265    else:
266        raise ValueError("day_mean_method must be 'standard_lc' or 'data'.")
267
268    return milk_matrix - day_mean

Center lactation yields before covariance estimation.

Arguments:
  • milk_matrix: Yield matrix with lactations in rows and days in columns.
  • standard_lc: Expected day-wise milk yield profile.
  • day_mean_method: Mean-centering strategy. Supported values are "standard_lc" (default) and "data".
Returns:

A centered matrix with the same shape as milk_matrix.

Raises:
  • ValueError: If day_mean_method is not supported.
def build_covariance_matrix(rho: float, size: int) -> numpy.ndarray:
271def build_covariance_matrix(rho: float, size: int) -> np.ndarray:
272    """Construct a covariance matrix.
273
274    Cole et al. (2007) estimated correlations among test-day yields using a
275    simplified model with an identity matrix (I) for daily measurement error
276    and an autoregressive matrix (E) for biological change. E is defined as
277    ``Eij = r ** |i-j|`` where ``i`` and ``j`` are test-day DIM and
278    ``0 < r < 1``.
279
280    Element ``(i, j)`` is ``rho ** abs(i - j)``.
281
282    Args:
283        rho: AR(1) correlation parameter.
284        size: Matrix dimension.
285
286    Returns:
287        A ``(size, size)`` AR(1) correlation matrix.
288    """
289    idx = np.arange(size)
290    M = np.abs(idx[:, None] - idx[None, :])
291    return rho**M

Construct a covariance matrix.

Cole et al. (2007) estimated correlations among test-day yields using a simplified model with an identity matrix (I) for daily measurement error and an autoregressive matrix (E) for biological change. E is defined as Eij = r ** |i-j| where i and j are test-day DIM and 0 < r < 1.

Element (i, j) is rho ** abs(i - j).

Arguments:
  • rho: AR(1) correlation parameter.
  • size: Matrix dimension.
Returns:

A (size, size) AR(1) correlation matrix.

def fit_autocorrelation_matrix( df: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray) -> dict[str, numpy.ndarray | float]:
294def fit_autocorrelation_matrix(
295    df: pd.DataFrame, standard_lc: np.ndarray
296) -> dict[str, np.ndarray | float]:
297    """Estimate covariance parameters for best prediction.
298
299    The model is ``B = b1 * I + b2 * E`` where ``E`` is an AR(1) correlation
300    matrix. Parameters are optimized in transformed space and mapped back to
301    enforce ``b1 > 0``, ``b2 > 0``, and ``0 < rho < 1``.
302
303    Args:
304        df: Reference milk-recording dataframe.
305        standard_lc: Population mean curve used for centering.
306
307    Returns:
308        Dictionary with:
309        - ``"B_hat"``: fitted covariance matrix.
310        - ``"R_hat"``: correlation matrix derived from ``B_hat``.
311        - ``"b1"``, ``"b2"``, ``"rho"``: fitted scalar parameters.
312    """
313    milk_matrix = pivot_milk_recordings_to_matrix(df)
314    centered_matrix = center_lactation_data(milk_matrix, standard_lc)
315    n_lactations, n_days = centered_matrix.shape
316    observed_indices = [np.where(~np.isnan(centered_matrix[i]))[0] for i in range(n_lactations)]
317
318    def negative_log_likelihood(params: np.ndarray) -> float:
319        p_b1, p_b2, p_rho = params
320        b1 = float(np.exp(p_b1))
321        b2 = float(np.exp(p_b2))
322        rho = float(1 / (1 + np.exp(-p_rho)))  # now rho in (0,1)
323        correlation_matrix = build_covariance_matrix(rho, n_days)
324
325        total = 0.0
326        for lactation_idx, day_indices in enumerate(observed_indices):
327            observation_count = len(day_indices)
328            if observation_count == 0:
329                continue
330
331            observations = centered_matrix[lactation_idx, day_indices]
332            correlation_subset = correlation_matrix[np.ix_(day_indices, day_indices)]
333            sigma = b1 * np.eye(observation_count) + b2 * correlation_subset
334
335            # Numerical safeguards: try Cholesky and penalize non-PD parameters.
336            try:
337                cholesky_factor, lower = cho_factor(sigma, check_finite=False)
338                solution = cho_solve((cholesky_factor, lower), observations, check_finite=False)
339            except LinAlgError:
340                # penalty for non-PD
341                return float(1e12 + np.sum(np.abs(params)))
342
343            quadratic_form = float(observations @ solution)
344            log_determinant = 2.0 * np.sum(np.log(np.diag(cholesky_factor)))
345            total += 0.5 * (
346                log_determinant + quadratic_form + observation_count * np.log(2 * np.pi)
347            )
348
349        # return total negative log-likelihood
350        return float(total)
351
352    # initial guesses and optimization. A 50/50 split in variance is assumed as starting point
353    initial_variance = max(float(np.nanvar(centered_matrix)), 1e-6)
354    initial_params = [
355        np.log(0.5 * initial_variance),
356        np.log(0.5 * initial_variance),
357        0.5,
358    ]
359
360    result = minimize(
361        negative_log_likelihood,
362        x0=initial_params,
363        method="L-BFGS-B",
364        options={"maxiter": 2000, "ftol": 1e-8},
365    )
366
367    if not result.success:
368        print(f"Optimization warning: {result.message}")
369
370    log_b1_hat, log_b2_hat, logit_rho_hat = result.x
371    b1_hat = float(np.exp(log_b1_hat))
372    b2_hat = float(np.exp(log_b2_hat))
373    rho_hat = float(1 / (1 + np.exp(-logit_rho_hat)))
374    correlation_matrix = build_covariance_matrix(rho_hat, n_days)
375    covariance_matrix = b1_hat * np.eye(n_days) + b2_hat * correlation_matrix
376
377    # convert to correlation matrix
378    std = np.sqrt(np.diag(covariance_matrix))
379    correlation_matrix = covariance_matrix / np.outer(std, std)
380
381    return {
382        "B_hat": covariance_matrix,
383        "R_hat": correlation_matrix,
384        "b1": b1_hat,
385        "b2": b2_hat,
386        "rho": rho_hat,
387    }

Estimate covariance parameters for best prediction.

The model is B = b1 * I + b2 * E where E is an AR(1) correlation matrix. Parameters are optimized in transformed space and mapped back to enforce b1 > 0, b2 > 0, and 0 < rho < 1.

Arguments:
  • df: Reference milk-recording dataframe.
  • standard_lc: Population mean curve used for centering.
Returns:

Dictionary with:

  • "B_hat": fitted covariance matrix.
  • "R_hat": correlation matrix derived from B_hat.
  • "b1", "b2", "rho": fitted scalar parameters.
def preprocess_measured_data( lactation: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray) -> pandas.core.series.Series:
393def preprocess_measured_data(lactation: pd.DataFrame, standard_lc: np.ndarray) -> pd.Series:
394    """Build a 305-day deviation vector for a single lactation.
395
396    For observed days, this computes ``MilkingYield - standard_lc[day]``.
397    The result is reindexed to days 1..305 with unobserved days filled as zero.
398
399    Args:
400        lactation: Single-lactation dataframe with ``DaysInMilk`` and
401            ``MilkingYield``.
402        standard_lc: Expected daily milk yield profile.
403
404    Returns:
405        A Series indexed by day 1..305 containing milk-yield deviations.
406    """
407
408    # calculate the difference between the expected (population mean) and measured milk yield
409
410    # extract the expected milk yields for the measured DaysInMilk in the df
411    day_idx = lactation["DaysInMilk"].to_numpy(dtype=int) - 1
412    expected = np.asarray(standard_lc, dtype=float)[day_idx]
413
414    # Subtract
415    lactation["MilkDifference"] = lactation["MilkingYield"].to_numpy(dtype=float) - expected
416
417    # Create a Series of length 305 with missing values = 0
418    milk_difference = cast(pd.Series, lactation.set_index("DaysInMilk")["MilkDifference"])
419    corrected_series = milk_difference.reindex(range(1, 306), fill_value=0)
420
421    return corrected_series

Build a 305-day deviation vector for a single lactation.

For observed days, this computes MilkingYield - standard_lc[day]. The result is reindexed to days 1..305 with unobserved days filled as zero.

Arguments:
  • lactation: Single-lactation dataframe with DaysInMilk and MilkingYield.
  • standard_lc: Expected daily milk yield profile.
Returns:

A Series indexed by day 1..305 containing milk-yield deviations.

def best_predict_method_single_lac( lactation: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray = STANDARD_CURVE, covariance_matrix: numpy.ndarray = COV_MATRIX) -> float:
424def best_predict_method_single_lac(
425    lactation: pd.DataFrame,
426    standard_lc: np.ndarray = STANDARD_CURVE,
427    covariance_matrix: np.ndarray = COV_MATRIX,
428) -> float:
429    """Predict 305-day cumulative yield for one lactation.
430
431    Observed test-day deviations are projected over all 305 days using the
432    covariance structure and then added to the baseline cumulative standard
433    curve.
434
435    By default this function uses the package-provided standard curve and covariance matrix.
436    But it is also possible to provide your own standard curve and covariance matrix,
437    for example when you want to fit these ingredients from your own reference population.
438
439
440    Args:
441        lactation: Observed records for one lactation.
442        standard_lc: Population mean daily yield profile.
443        covariance_matrix: Day-to-day covariance matrix on the 305-day grid.
444
445    Returns:
446        Predicted cumulative 305-day milk yield.
447
448    Notes:
449        Duplicate day records are resolved with ``keep="last"`` before
450        prediction. If no valid observations remain in days 1..305, the method
451        returns the cumulative standard curve.
452    """
453    filtered_lactation = lactation.loc[
454        (lactation["DaysInMilk"] >= 1) & (lactation["DaysInMilk"] <= 305)
455    ].copy()
456    filtered_lactation = filtered_lactation.drop_duplicates(subset=["DaysInMilk"], keep="last")
457    filtered_lactation = filtered_lactation.sort_values("DaysInMilk")
458
459    corrected_series = preprocess_measured_data(
460        filtered_lactation,
461        standard_lc=standard_lc,
462    )
463
464    if filtered_lactation.empty:
465        return float(np.sum(standard_lc))
466
467    obs_idx_1based = filtered_lactation["DaysInMilk"].to_numpy(dtype=int)  # DaysInMilk: 1-305
468    obs_idx_0based = obs_idx_1based - 1  # Convert to 0-based matrix indices: 0-304
469    y_obs = corrected_series.loc[obs_idx_1based].to_numpy(
470        dtype=float
471    )  # corrected_series is indexed by DaysInMilk (1-305)
472
473    # Extract covariance blocks
474    B_oo = covariance_matrix[
475        np.ix_(obs_idx_0based, obs_idx_0based)
476    ]  # Use 0-based indices for matrix
477    B_mo = covariance_matrix[:, obs_idx_0based]  # Use 0-based indices for matrix
478
479    # solve
480    c, lower = cho_factor(B_oo)
481    alpha = cho_solve((c, lower), y_obs)
482
483    # Predict full deviation curve
484    y_estimate = B_mo @ alpha
485
486    # Total milk = baseline + deviation
487    deviation = np.sum(y_estimate)
488
489    total = np.sum(standard_lc) + deviation
490
491    return total

Predict 305-day cumulative yield for one lactation.

Observed test-day deviations are projected over all 305 days using the covariance structure and then added to the baseline cumulative standard curve.

By default this function uses the package-provided standard curve and covariance matrix. But it is also possible to provide your own standard curve and covariance matrix, for example when you want to fit these ingredients from your own reference population.

Arguments:
  • lactation: Observed records for one lactation.
  • standard_lc: Population mean daily yield profile.
  • covariance_matrix: Day-to-day covariance matrix on the 305-day grid.
Returns:

Predicted cumulative 305-day milk yield.

Notes:

Duplicate day records are resolved with keep="last" before prediction. If no valid observations remain in days 1..305, the method returns the cumulative standard curve.

def best_predict_method( df: pandas.core.frame.DataFrame, standard_lc: numpy.ndarray = STANDARD_CURVE, days_in_milk_col: str | None = None, milking_yield_col: str | None = None, test_id_col: str | None = None, default_test_id: int = 0, covariance_matrix: numpy.ndarray | None = COV_MATRIX, fit_standard_lc_from_data: bool = False, reference_df: pandas.core.frame.DataFrame | None = None) -> pandas.core.frame.DataFrame:
494def best_predict_method(
495    df: pd.DataFrame,
496    standard_lc: np.ndarray = STANDARD_CURVE,
497    days_in_milk_col: str | None = None,
498    milking_yield_col: str | None = None,
499    test_id_col: str | None = None,
500    default_test_id: int = 0,
501    covariance_matrix: np.ndarray | None = COV_MATRIX,
502    fit_standard_lc_from_data: bool = False,
503    reference_df: pd.DataFrame | None = None,
504) -> pd.DataFrame:
505    """Apply best prediction to one or more lactations.
506
507    By default this function uses the package-provided standard curve and covariance matrix.
508    But it is also possible to provide your own standard curve and covariance matrix,
509    for example when you want to fit these ingredients from your own reference population.
510    This can be done in two ways: either by fitting the covariance matrix and standard curve
511    directly from a reference dataset by providing a pandas dataframe at 'reference_df ='
512    when ``fit_standard_lc_from_data`` is True.
513    Alternative for customization is to set standard_lc_305 and covariance_matrix
514    directly in the function call.
515
516    Args:
517        df: Input observations. If ``TestId`` is missing, all rows are treated
518            as one lactation.
519        standard_lc: Expected daily milk yield lactation curve on days 1..305.
520            If not provided, the package's default curve is used.
521            Or fit your own standard curve from a reference dataset by
522            providing a pandas dataframe at 'reference_df ='
523            when ``fit_standard_lc_from_data`` is True.
524        days_in_milk_col: Optional input column name for days in milk. If
525            provided, it is mapped to ``DaysInMilk``.
526        milking_yield_col: Optional input column name for milk yield. If
527            provided, it is mapped to ``MilkingYield``.
528        test_id_col: Optional input column name for lactation/test identifier.
529            If provided, it is mapped to ``TestId``.
530        default_test_id: Fallback test id used when no test-id column is
531            available.
532        covariance_matrix: Optional prefit covariance matrix. If omitted,
533            the default matrix is used or
534             ``reference_df`` can be used to fit one for your own data.
535        fit_standard_lc_from_data: Whether to fit covariance information from
536            ``reference_df`` instead of using a provided covariance matrix.
537        reference_df: Reference dataframe used when ``covariance_matrix`` and
538            ``standard_lc`` are not provided and ``fit_standard_lc_from_data``
539            is True.
540
541    Returns:
542        Dataframe with columns ``TestId`` and ``LactationMilkYield``.
543
544    Raises:
545        ValueError: If neither ``covariance_matrix`` nor ``reference_df`` is
546            provided.
547    """
548    # Standardize columns and filter DIM <= 305
549    df = standardize_lactation_columns(
550        df,
551        days_in_milk_col=days_in_milk_col,
552        milking_yield_col=milking_yield_col,
553        test_id_col=test_id_col,
554        default_test_id=default_test_id,
555        max_dim=305,
556    )
557
558    # Fit covariance if not provided
559    if fit_standard_lc_from_data:
560        if reference_df is None:
561            raise ValueError("Provide reference_df to fit your own standard lactation curve.")
562        reference_df = standardize_lactation_columns(
563            reference_df,
564            days_in_milk_col=days_in_milk_col,
565            milking_yield_col=milking_yield_col,
566            test_id_col=test_id_col,
567            default_test_id=default_test_id,
568            max_dim=305,
569        )
570        covariance_matrix = cast(
571            np.ndarray, fit_autocorrelation_matrix(reference_df, standard_lc)["B_hat"]
572        )
573
574    covariance_matrix_array = cast(np.ndarray, covariance_matrix)
575
576    df = df.copy()
577
578    results = []
579
580    for test_id, lactation in df.groupby("TestId"):
581        pred = best_predict_method_single_lac(
582            lactation,
583            standard_lc,
584            covariance_matrix_array,
585        )
586        results.append({"TestId": test_id, "LactationMilkYield": pred})
587
588    return pd.DataFrame(results)

Apply best prediction to one or more lactations.

By default this function uses the package-provided standard curve and covariance matrix. But it is also possible to provide your own standard curve and covariance matrix, for example when you want to fit these ingredients from your own reference population. This can be done in two ways: either by fitting the covariance matrix and standard curve directly from a reference dataset by providing a pandas dataframe at 'reference_df =' when fit_standard_lc_from_data is True. Alternative for customization is to set standard_lc_305 and covariance_matrix directly in the function call.

Arguments:
  • df: Input observations. If TestId is missing, all rows are treated as one lactation.
  • standard_lc: Expected daily milk yield lactation curve on days 1..305. If not provided, the package's default curve is used. Or fit your own standard curve from a reference dataset by providing a pandas dataframe at 'reference_df =' when fit_standard_lc_from_data is True.
  • days_in_milk_col: Optional input column name for days in milk. If provided, it is mapped to DaysInMilk.
  • milking_yield_col: Optional input column name for milk yield. If provided, it is mapped to MilkingYield.
  • test_id_col: Optional input column name for lactation/test identifier. If provided, it is mapped to TestId.
  • default_test_id: Fallback test id used when no test-id column is available.
  • covariance_matrix: Optional prefit covariance matrix. If omitted, the default matrix is used or reference_df can be used to fit one for your own data.
  • fit_standard_lc_from_data: Whether to fit covariance information from reference_df instead of using a provided covariance matrix.
  • reference_df: Reference dataframe used when covariance_matrix and standard_lc are not provided and fit_standard_lc_from_data is True.
Returns:

Dataframe with columns TestId and LactationMilkYield.

Raises:
  • ValueError: If neither covariance_matrix nor reference_df is provided.
def demo() -> None:
594def demo() -> None:
595    """Run a minimal example of best prediction with mock data."""
596
597    # --- Single + multiple lactations example ---
598    test_df = pd.DataFrame(
599        {
600            "TestId": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2],
601            "DaysInMilk": [10, 20, 30, 40, 50, 15, 25, 35, 45, 55],
602            "MilkingYield": [30, 35, 40, 38, 36, 28, 33, 37, 39, 34],
603        }
604    )
605
606    result_cov = best_predict_method(
607        test_df, standard_lc=STANDARD_CURVE, covariance_matrix=COV_MATRIX
608    )
609
610    print("Predictions with provided covariance matrix:")
611    print(result_cov)

Run a minimal example of best prediction with mock data.