lactationcurve.characteristics.best_predict
Best prediction for cumulative 305-day milk yield.
Purpose
The Best Prediction (BP) method estimates 305-day lactation yield from intermittent test-day records by predicting deviations from an expected lactation curve at all unobserved days in milk. The method uses standard lactation curves that represent the expected course of lactation for specific cow subgroups, such as breed or parity.
By incorporating these standard curves, BP accounts for the typical lactation pattern in which milk yield increases after calving, reaches a peak, and subsequently declines. Missing daily yields are estimated from the covariance structure of test-day records and their deviations from the expected curve.
This module implements the best-prediction approach described by VanRaden (1997) for ICAR Procedure 2, Section 2 (Computing Accumulated Lactation Yield).
Method Summary
Adapted from the Best Predict Manual by Cole and VanRaden (2015)
Best prediction combines a population-level standard lactation curve with correlation-based corrections derived from observed test-day deviations. The method projects observed deviations onto the full 305-day curve using a covariance structure estimated from reference data.
Individual daily yield can be modeled as the expected value of a management group plus a deviation from that mean: yi = E(yi) + ti where yi is an individual yield on test day i, E(yi) is the expected yield for an animal in the same management group (Wiggans, Misztal, and Van Vleck 1988) on the same test day, and ti is a deviation from the group mean on the same test day. Suppose that µ is a vector of expected values for each day of lactation for a single trait, t is a vector of 305 test day deviations for the trait, and tm is a vector of only the measured deviations. The means and variances of t and tm are assumed known with V (t) = V and V (tm) = Vm. The covariance between t and tm, C, is also assumed known.
Lactation yield: A cow’s true 305-d yield (y) is the sum of the expected values for each day (1′µ) plus the sum of her 305 deviations from expectations (1′t), where 1′ is a vector of 1s of length 305. The cow’s true yield, and the best prediction of that yield (yˆ), are: y = 1′t yˆ = 1′CVm^−1 * tm
Key Entry Points
best_predict_method: Apply best prediction perTestId.best_predict_method_single_lac: Predict one lactation.fit_autocorrelation_matrix: Fit covariance structure from reference data.
Column Flexibility
The functions accept several case-insensitive column name aliases and can
create a default TestId if one is missing. Recognized aliases:
- Days in Milk:
["daysinmilk", "dim", "testday"] - Milk Yield:
["milkingyield", "testdaymilkyield", "milkyield", "yield"] - Test Id:
["animalid", "testid", "id"]
It is also possible to provide your own column names so the function can be applied to dataframes with different column naming conventions.
Defaults
STANDARD_CURVE: Baseline expected lactation curve for days 1..305 (Wood).COV_MATRIX: Default day-to-day covariance structure used for projection.
Notes
- The default assets are loaded from the package
datadirectory. - Users can fit curve and covariance ingredients from their own reference population.
- The method can be applied to lactations without any measurements, in which case the result will be the population mean from the standard curve.
- Predicted milk yields have less variance than true milk yields. With TIM, estimated yields have more variance than true yields. The reason is that predicted yields are regressed toward the mean unless all 305 daily yields are observed.
- Currently it is not yet possible to predict lactation yields for lactation windows other than 305 days, but this is on the roadmap for future updates.
- The method currently assumes that the standard curves and covariance structure is the same for all lactations, but future updates may allow using different standard curves and covariance structures for different subgroups of lactations (e.g., by breed or parity).
- For the used standard lactation curve currently the Wood LC model is used, it is possible to use other methods aswell.
- Strengths of Best Prediction includes its ability to leverage the full covariance structure of the lactation curve and to therefore potentially provide more accurate predictions especially for lactations with few test days. This is because the inherent shape of the lactation curve is taken into account in the projection of observed deviations to unobserved days. It has fewer in between steps then ISLC and is therefore easier to use.
- Disadvantages of Best Prediction include its computational intensity, especially when fitting the covariance structure from data. And the method is not as easy to understand as a simpler method such as the test interval method, which can make it less transparent to users. The best results are obtained when the standard curve and covariance matrix are from the same population as the data, which can be a barrier for users without access to a large reference dataset. This also causes inconsistencies in cumulative milk yield results depending on which standard curves are used. A cow with the exact same test-day records can have a different cumulative milk yield estimates depending on the standard curve used, which can be considered unfair.
Still coming functionality
- Milk, fat, and protein yields can be processed separately using single-trait best prediction or jointly using multi-trait best prediction. Replacement of the measured milk yields by the fat or protein yields gives the single-trait predictions for fat or for protein. Multi-trait predictions require larger vectors and matrices but similar algebra.
References
VanRaden, P. M. (1997). Lactation yields and accuracies computed from test day yields and (co) variances by best prediction. Journal of dairy science, 80(11), 3015-3022.
A Manual for Use of BESTPRED: A Program for Estimation of Lactation Yield and Persistency Using Best Prediction Release 2.0 rc 7 J. B. Cole and P. M. VanRaden August 12, 2009 Revised April 27, 2015 Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States Department of Agriculture, Room 306 Bldg 005 BARC-West, 10300 Baltimore Avenue, Beltsville, MD 20705-2350
Original code for best predict can be found on GitHub
Author: Meike van Leerdam, Date: 24-04-2026 Last update: 21-May-2026
1"""Best prediction for cumulative 305-day milk yield. 2 3Purpose 4------- 5The Best Prediction (BP) method estimates 305-day lactation yield from 6intermittent test-day records by predicting deviations from an expected 7lactation curve at **all** unobserved days in milk. The method uses standard 8lactation curves that represent the expected course of lactation for 9specific cow subgroups, such as breed or parity. 10 11By incorporating these standard curves, BP accounts for the typical 12lactation pattern in which milk yield increases after calving, reaches a 13peak, and subsequently declines. Missing daily yields are estimated from 14the covariance structure of test-day records and their deviations from the 15expected curve. 16 17This module implements the best-prediction approach described by 18VanRaden (1997) for ICAR Procedure 2, Section 2 (Computing Accumulated 19Lactation Yield). 20 21Method Summary 22-------------- 23Adapted from the Best Predict Manual by Cole and VanRaden (2015) 24 25Best prediction combines a population-level standard lactation curve with 26correlation-based corrections derived from observed test-day deviations. 27The method projects observed deviations onto the full 305-day curve using a 28covariance structure estimated from reference data. 29 30Individual daily yield can be modeled as the expected value of a management 31group plus a deviation from that mean: 32yi = E(yi) + ti 33where yi 34is an individual yield on test day i, E(yi) is the expected yield for an 35animal in the same management group 36(Wiggans, Misztal, and Van Vleck 1988) on the same test day, and ti 37is a deviation from the group mean on the same 38test day. Suppose that µ is a vector of expected values for each day of 39lactation for a single trait, t is a vector of 305 40test day deviations for the trait, and tm is a vector of only the measured 41deviations. The means and variances of t 42and tm are assumed known with V (t) = V and V (tm) = Vm. The covariance 43between t and tm, C, is also assumed 44known. 45 46**Lactation yield**: A cow’s true 305-d yield (y) is the sum of the expected 47values for each day (1′µ) plus the sum of her 48305 deviations from expectations (1′t), where 1′ 49is a vector of 1s of length 305. The cow’s true yield, and the best 50prediction of that yield (yˆ), are: 51y = 1′t 52yˆ = 1′CVm^−1 * tm 53 54 55Key Entry Points 56---------------- 57- ``best_predict_method``: Apply best prediction per ``TestId``. 58- ``best_predict_method_single_lac``: Predict one lactation. 59- ``fit_autocorrelation_matrix``: Fit covariance structure from reference data. 60 61Column Flexibility 62------------------ 63The functions accept several case-insensitive column name aliases and can 64create a default ``TestId`` if one is missing. Recognized aliases: 65 66- Days in Milk: `["daysinmilk", "dim", "testday"]` 67- Milk Yield: `["milkingyield", "testdaymilkyield", "milkyield", "yield"]` 68- Test Id: `["animalid", "testid", "id"]` 69 70It is also possible to provide your own column names so the function 71can be applied to dataframes with different column naming conventions. 72 73Defaults 74-------- 75- ``STANDARD_CURVE``: Baseline expected lactation curve for days 1..305 (Wood). 76- ``COV_MATRIX``: Default day-to-day covariance structure used for projection. 77 78Notes 79----- 80- The default assets are loaded from the package ``data`` directory. 81- Users can fit curve and covariance ingredients from their own reference 82 population. 83- The method can be applied to lactations without any measurements, 84 in which case the result will be the population mean from the standard curve. 85- Predicted milk yields have less variance than true milk yields. With TIM, 86estimated yields have more variance than true yields. The reason is that predicted yields are 87regressed toward the mean unless all 305 daily yields are observed. 88- Currently it is not yet possible to predict lactation yields for lactation windows 89 other than 305 days, but this is on the roadmap for future updates. 90- The method currently assumes that the standard curves and covariance structure is the same 91 for all lactations, 92 but future updates may allow using different standard curves and covariance structures for 93 different subgroups of lactations (e.g., by breed or parity). 94- For the used standard lactation curve currently the Wood LC model is used, it is possible to 95 use other methods aswell. 96- Strengths of Best Prediction includes its ability to leverage the full covariance structure 97 of the lactation curve and to therefore potentially provide more accurate predictions 98 especially for lactations with few test days. This is because the inherent shape of the 99 lactation curve is taken into account in the projection of 100 observed deviations to unobserved days. 101 It has fewer in between steps then ISLC and is therefore easier to use. 102- Disadvantages of Best Prediction include its computational intensity, 103 especially when fitting the covariance structure from data. 104 And the method is not as easy to understand 105 as a simpler method such as the test interval method, 106 which can make it less transparent to users. 107 The best results are obtained when the standard curve and covariance matrix 108 are from the same population as the data, 109 which can be a barrier for users without access to a large reference dataset. 110 This also causes inconsistencies in cumulative milk yield results 111 depending on which standard curves are used. 112 A cow with the exact same test-day records 113 can have a different cumulative milk yield estimates depending 114 on the standard curve used, which can be considered unfair. 115 116Still coming functionality 117-------------------------- 118- Milk, fat, and protein yields can be processed separately using single-trait best 119 prediction or jointly using multi-trait best prediction. 120 Replacement of the measured milk yields by the fat or protein yields gives the 121 single-trait predictions for fat or for protein. 122 Multi-trait predictions require larger vectors and matrices but similar algebra. 123 124 125 126References 127--------- 128VanRaden, P. M. (1997). Lactation yields and accuracies computed from test 129day yields and (co) variances by best prediction. 130Journal of dairy science, 80(11), 3015-3022. 131 132A Manual for Use of BESTPRED: A Program for Estimation of Lactation Yield 133and Persistency Using Best Prediction 134Release 2.0 rc 7 135J. B. Cole and P. M. VanRaden 136August 12, 2009 137Revised April 27, 2015 138Animal Genomics and Improvement Laboratory, Agricultural Research Service, United States 139Department of Agriculture, Room 306 Bldg 005 BARC-West, 10300 Baltimore Avenue, 140Beltsville, MD 20705-2350 141 142Original code for best predict can be found on [GitHub](https://github.com/wintermind/bestpred) 143 144Author: Meike van Leerdam, 145Date: 24-04-2026 146Last update: 21-May-2026 147""" 148 149import inspect 150from pathlib import Path 151from typing import cast 152 153import numpy as np 154import pandas as pd 155from scipy.linalg import LinAlgError, cho_factor, cho_solve 156from scipy.optimize import minimize 157 158from lactationcurve.fitting import fit_lactation_curve 159from lactationcurve.preprocessing import standardize_lactation_columns 160 161# get the standard lactation curve ingredients back from the data storage: 162DATA_DIR = Path(__file__).resolve().parent / "data" 163COV_MATRIX = np.load(DATA_DIR / "covariance_matrix_best_predict.npy") 164STANDARD_CURVE = np.load(DATA_DIR / "standard_lc_wood.npy") 165 166 167# Lightweight repr-only wrapper for cleaner generated signatures in docs 168class _DocDefault: 169 def __init__(self, label: str) -> None: 170 self.label = label 171 172 def __repr__(self) -> str: # pragma: no cover - docs-only 173 return self.label 174 175 176_DOC_STANDARD_CURVE = _DocDefault("STANDARD_CURVE") 177_DOC_COV_MATRIX = _DocDefault("COV_MATRIX") 178 179# functions to fit you own standard curve and covariance matrix 180 181 182def pivot_milk_recordings_to_matrix(df: pd.DataFrame) -> np.ndarray: 183 """Convert long-format recordings to a fixed 305-day matrix. 184 185 Rows represent lactations (``TestId``) and columns represent days in milk 186 from 1 through 305. Missing observations are kept as ``NaN``. 187 188 Args: 189 df: Dataframe with ``TestId``, ``DaysInMilk``, and ``MilkingYield``. 190 191 Returns: 192 A NumPy array of shape ``(n_lactations, 305)``. 193 """ 194 # ensure sorting 195 df = df.sort_values(["TestId", "DaysInMilk"]) 196 197 # pivot to wide format 198 milk_recordings_pivot = df.pivot_table( 199 index="TestId", columns="DaysInMilk", values="MilkingYield" 200 ) 201 202 # enforce fixed 305-day grid alignment used by best-prediction 203 milk_recordings_pivot = milk_recordings_pivot.reindex(columns=range(1, 306)) 204 205 # convert to numpy matrix 206 Y = milk_recordings_pivot.to_numpy() 207 return Y 208 209 210def fit_standard_lc(df: pd.DataFrame, lc_model: str = "Wood") -> np.ndarray: 211 """Fit a population-level standard lactation curve. 212 213 The curve is fit with the package's frequentist Wood model and returned on 214 the fixed day-1..305 grid. 215 216 Args: 217 df: Reference dataframe containing ``DaysInMilk`` and ``MilkingYield``. 218 lc_model: The lactation curve model to fit. 219 Default is the "Wood" lactation curve model. 220 221 Returns: 222 A NumPy array of expected daily milk yield for days 1..305. 223 224 Notes: 225 This mean curve acts as the baseline in best prediction. Individual 226 lactations are represented as deviations around this population profile. 227 """ 228 standard_lc = pd.Series( 229 fit_lactation_curve( 230 df["DaysInMilk"].values, 231 df["MilkingYield"].values, 232 model=lc_model, 233 fitting="frequentist", 234 ), 235 index=range(1, 306), 236 ) 237 238 return standard_lc.to_numpy(dtype=float) 239 240 241def center_lactation_data( 242 milk_matrix: np.ndarray, 243 standard_lc: np.ndarray, 244 day_mean_method: str = "standard_lc", 245) -> np.ndarray: 246 """Center lactation yields before covariance estimation. 247 248 Args: 249 milk_matrix: Yield matrix with lactations in rows and days in columns. 250 standard_lc: Expected day-wise milk yield profile. 251 day_mean_method: Mean-centering strategy. Supported values are 252 ``"standard_lc"`` (default) and ``"data"``. 253 254 Returns: 255 A centered matrix with the same shape as ``milk_matrix``. 256 257 Raises: 258 ValueError: If ``day_mean_method`` is not supported. 259 """ 260 if day_mean_method == "standard_lc": 261 day_mean = standard_lc 262 elif day_mean_method == "data": 263 day_mean = np.nanmean(milk_matrix, axis=0) 264 else: 265 raise ValueError("day_mean_method must be 'standard_lc' or 'data'.") 266 267 return milk_matrix - day_mean 268 269 270def build_covariance_matrix(rho: float, size: int) -> np.ndarray: 271 """Construct a covariance matrix. 272 273 Cole et al. (2007) estimated correlations among test-day yields using a 274 simplified model with an identity matrix (I) for daily measurement error 275 and an autoregressive matrix (E) for biological change. E is defined as 276 ``Eij = r ** |i-j|`` where ``i`` and ``j`` are test-day DIM and 277 ``0 < r < 1``. 278 279 Element ``(i, j)`` is ``rho ** abs(i - j)``. 280 281 Args: 282 rho: AR(1) correlation parameter. 283 size: Matrix dimension. 284 285 Returns: 286 A ``(size, size)`` AR(1) correlation matrix. 287 """ 288 idx = np.arange(size) 289 M = np.abs(idx[:, None] - idx[None, :]) 290 return rho**M 291 292 293def fit_autocorrelation_matrix( 294 df: pd.DataFrame, standard_lc: np.ndarray 295) -> dict[str, np.ndarray | float]: 296 """Estimate covariance parameters for best prediction. 297 298 The model is ``B = b1 * I + b2 * E`` where ``E`` is an AR(1) correlation 299 matrix. Parameters are optimized in transformed space and mapped back to 300 enforce ``b1 > 0``, ``b2 > 0``, and ``0 < rho < 1``. 301 302 Args: 303 df: Reference milk-recording dataframe. 304 standard_lc: Population mean curve used for centering. 305 306 Returns: 307 Dictionary with: 308 - ``"B_hat"``: fitted covariance matrix. 309 - ``"R_hat"``: correlation matrix derived from ``B_hat``. 310 - ``"b1"``, ``"b2"``, ``"rho"``: fitted scalar parameters. 311 """ 312 milk_matrix = pivot_milk_recordings_to_matrix(df) 313 centered_matrix = center_lactation_data(milk_matrix, standard_lc) 314 n_lactations, n_days = centered_matrix.shape 315 observed_indices = [np.where(~np.isnan(centered_matrix[i]))[0] for i in range(n_lactations)] 316 317 def negative_log_likelihood(params: np.ndarray) -> float: 318 p_b1, p_b2, p_rho = params 319 b1 = float(np.exp(p_b1)) 320 b2 = float(np.exp(p_b2)) 321 rho = float(1 / (1 + np.exp(-p_rho))) # now rho in (0,1) 322 correlation_matrix = build_covariance_matrix(rho, n_days) 323 324 total = 0.0 325 for lactation_idx, day_indices in enumerate(observed_indices): 326 observation_count = len(day_indices) 327 if observation_count == 0: 328 continue 329 330 observations = centered_matrix[lactation_idx, day_indices] 331 correlation_subset = correlation_matrix[np.ix_(day_indices, day_indices)] 332 sigma = b1 * np.eye(observation_count) + b2 * correlation_subset 333 334 # Numerical safeguards: try Cholesky and penalize non-PD parameters. 335 try: 336 cholesky_factor, lower = cho_factor(sigma, check_finite=False) 337 solution = cho_solve((cholesky_factor, lower), observations, check_finite=False) 338 except LinAlgError: 339 # penalty for non-PD 340 return float(1e12 + np.sum(np.abs(params))) 341 342 quadratic_form = float(observations @ solution) 343 log_determinant = 2.0 * np.sum(np.log(np.diag(cholesky_factor))) 344 total += 0.5 * ( 345 log_determinant + quadratic_form + observation_count * np.log(2 * np.pi) 346 ) 347 348 # return total negative log-likelihood 349 return float(total) 350 351 # initial guesses and optimization. A 50/50 split in variance is assumed as starting point 352 initial_variance = max(float(np.nanvar(centered_matrix)), 1e-6) 353 initial_params = [ 354 np.log(0.5 * initial_variance), 355 np.log(0.5 * initial_variance), 356 0.5, 357 ] 358 359 result = minimize( 360 negative_log_likelihood, 361 x0=initial_params, 362 method="L-BFGS-B", 363 options={"maxiter": 2000, "ftol": 1e-8}, 364 ) 365 366 if not result.success: 367 print(f"Optimization warning: {result.message}") 368 369 log_b1_hat, log_b2_hat, logit_rho_hat = result.x 370 b1_hat = float(np.exp(log_b1_hat)) 371 b2_hat = float(np.exp(log_b2_hat)) 372 rho_hat = float(1 / (1 + np.exp(-logit_rho_hat))) 373 correlation_matrix = build_covariance_matrix(rho_hat, n_days) 374 covariance_matrix = b1_hat * np.eye(n_days) + b2_hat * correlation_matrix 375 376 # convert to correlation matrix 377 std = np.sqrt(np.diag(covariance_matrix)) 378 correlation_matrix = covariance_matrix / np.outer(std, std) 379 380 return { 381 "B_hat": covariance_matrix, 382 "R_hat": correlation_matrix, 383 "b1": b1_hat, 384 "b2": b2_hat, 385 "rho": rho_hat, 386 } 387 388 389# Functions for best predict that also work with the provided standard curve and covariance matrix. 390 391 392def preprocess_measured_data(lactation: pd.DataFrame, standard_lc: np.ndarray) -> pd.Series: 393 """Build a 305-day deviation vector for a single lactation. 394 395 For observed days, this computes ``MilkingYield - standard_lc[day]``. 396 The result is reindexed to days 1..305 with unobserved days filled as zero. 397 398 Args: 399 lactation: Single-lactation dataframe with ``DaysInMilk`` and 400 ``MilkingYield``. 401 standard_lc: Expected daily milk yield profile. 402 403 Returns: 404 A Series indexed by day 1..305 containing milk-yield deviations. 405 """ 406 407 # calculate the difference between the expected (population mean) and measured milk yield 408 409 # extract the expected milk yields for the measured DaysInMilk in the df 410 day_idx = lactation["DaysInMilk"].to_numpy(dtype=int) - 1 411 expected = np.asarray(standard_lc, dtype=float)[day_idx] 412 413 # Subtract 414 lactation["MilkDifference"] = lactation["MilkingYield"].to_numpy(dtype=float) - expected 415 416 # Create a Series of length 305 with missing values = 0 417 milk_difference = cast(pd.Series, lactation.set_index("DaysInMilk")["MilkDifference"]) 418 corrected_series = milk_difference.reindex(range(1, 306), fill_value=0) 419 420 return corrected_series 421 422 423def best_predict_method_single_lac( 424 lactation: pd.DataFrame, 425 standard_lc: np.ndarray = STANDARD_CURVE, 426 covariance_matrix: np.ndarray = COV_MATRIX, 427) -> float: 428 """Predict 305-day cumulative yield for one lactation. 429 430 Observed test-day deviations are projected over all 305 days using the 431 covariance structure and then added to the baseline cumulative standard 432 curve. 433 434 By default this function uses the package-provided standard curve and covariance matrix. 435 But it is also possible to provide your own standard curve and covariance matrix, 436 for example when you want to fit these ingredients from your own reference population. 437 438 439 Args: 440 lactation: Observed records for one lactation. 441 standard_lc: Population mean daily yield profile. 442 covariance_matrix: Day-to-day covariance matrix on the 305-day grid. 443 444 Returns: 445 Predicted cumulative 305-day milk yield. 446 447 Notes: 448 Duplicate day records are resolved with ``keep="last"`` before 449 prediction. If no valid observations remain in days 1..305, the method 450 returns the cumulative standard curve. 451 """ 452 filtered_lactation = lactation.loc[ 453 (lactation["DaysInMilk"] >= 1) & (lactation["DaysInMilk"] <= 305) 454 ].copy() 455 filtered_lactation = filtered_lactation.drop_duplicates(subset=["DaysInMilk"], keep="last") 456 filtered_lactation = filtered_lactation.sort_values("DaysInMilk") 457 458 corrected_series = preprocess_measured_data( 459 filtered_lactation, 460 standard_lc=standard_lc, 461 ) 462 463 if filtered_lactation.empty: 464 return float(np.sum(standard_lc)) 465 466 obs_idx_1based = filtered_lactation["DaysInMilk"].to_numpy(dtype=int) # DaysInMilk: 1-305 467 obs_idx_0based = obs_idx_1based - 1 # Convert to 0-based matrix indices: 0-304 468 y_obs = corrected_series.loc[obs_idx_1based].to_numpy( 469 dtype=float 470 ) # corrected_series is indexed by DaysInMilk (1-305) 471 472 # Extract covariance blocks 473 B_oo = covariance_matrix[ 474 np.ix_(obs_idx_0based, obs_idx_0based) 475 ] # Use 0-based indices for matrix 476 B_mo = covariance_matrix[:, obs_idx_0based] # Use 0-based indices for matrix 477 478 # solve 479 c, lower = cho_factor(B_oo) 480 alpha = cho_solve((c, lower), y_obs) 481 482 # Predict full deviation curve 483 y_estimate = B_mo @ alpha 484 485 # Total milk = baseline + deviation 486 deviation = np.sum(y_estimate) 487 488 total = np.sum(standard_lc) + deviation 489 490 return total 491 492 493def best_predict_method( 494 df: pd.DataFrame, 495 standard_lc: np.ndarray = STANDARD_CURVE, 496 days_in_milk_col: str | None = None, 497 milking_yield_col: str | None = None, 498 test_id_col: str | None = None, 499 default_test_id: int = 0, 500 covariance_matrix: np.ndarray | None = COV_MATRIX, 501 fit_standard_lc_from_data: bool = False, 502 reference_df: pd.DataFrame | None = None, 503) -> pd.DataFrame: 504 """Apply best prediction to one or more lactations. 505 506 By default this function uses the package-provided standard curve and covariance matrix. 507 But it is also possible to provide your own standard curve and covariance matrix, 508 for example when you want to fit these ingredients from your own reference population. 509 This can be done in two ways: either by fitting the covariance matrix and standard curve 510 directly from a reference dataset by providing a pandas dataframe at 'reference_df =' 511 when ``fit_standard_lc_from_data`` is True. 512 Alternative for customization is to set standard_lc_305 and covariance_matrix 513 directly in the function call. 514 515 Args: 516 df: Input observations. If ``TestId`` is missing, all rows are treated 517 as one lactation. 518 standard_lc: Expected daily milk yield lactation curve on days 1..305. 519 If not provided, the package's default curve is used. 520 Or fit your own standard curve from a reference dataset by 521 providing a pandas dataframe at 'reference_df =' 522 when ``fit_standard_lc_from_data`` is True. 523 days_in_milk_col: Optional input column name for days in milk. If 524 provided, it is mapped to ``DaysInMilk``. 525 milking_yield_col: Optional input column name for milk yield. If 526 provided, it is mapped to ``MilkingYield``. 527 test_id_col: Optional input column name for lactation/test identifier. 528 If provided, it is mapped to ``TestId``. 529 default_test_id: Fallback test id used when no test-id column is 530 available. 531 covariance_matrix: Optional prefit covariance matrix. If omitted, 532 the default matrix is used or 533 ``reference_df`` can be used to fit one for your own data. 534 fit_standard_lc_from_data: Whether to fit covariance information from 535 ``reference_df`` instead of using a provided covariance matrix. 536 reference_df: Reference dataframe used when ``covariance_matrix`` and 537 ``standard_lc`` are not provided and ``fit_standard_lc_from_data`` 538 is True. 539 540 Returns: 541 Dataframe with columns ``TestId`` and ``LactationMilkYield``. 542 543 Raises: 544 ValueError: If neither ``covariance_matrix`` nor ``reference_df`` is 545 provided. 546 """ 547 # Standardize columns and filter DIM <= 305 548 df = standardize_lactation_columns( 549 df, 550 days_in_milk_col=days_in_milk_col, 551 milking_yield_col=milking_yield_col, 552 test_id_col=test_id_col, 553 default_test_id=default_test_id, 554 max_dim=305, 555 ) 556 557 # Fit covariance if not provided 558 if fit_standard_lc_from_data: 559 if reference_df is None: 560 raise ValueError("Provide reference_df to fit your own standard lactation curve.") 561 reference_df = standardize_lactation_columns( 562 reference_df, 563 days_in_milk_col=days_in_milk_col, 564 milking_yield_col=milking_yield_col, 565 test_id_col=test_id_col, 566 default_test_id=default_test_id, 567 max_dim=305, 568 ) 569 covariance_matrix = cast( 570 np.ndarray, fit_autocorrelation_matrix(reference_df, standard_lc)["B_hat"] 571 ) 572 573 covariance_matrix_array = cast(np.ndarray, covariance_matrix) 574 575 df = df.copy() 576 577 results = [] 578 579 for test_id, lactation in df.groupby("TestId"): 580 pred = best_predict_method_single_lac( 581 lactation, 582 standard_lc, 583 covariance_matrix_array, 584 ) 585 results.append({"TestId": test_id, "LactationMilkYield": pred}) 586 587 return pd.DataFrame(results) 588 589 590# demo function so I can see if this script runs as expected 591 592 593def demo() -> None: 594 """Run a minimal example of best prediction with mock data.""" 595 596 # --- Single + multiple lactations example --- 597 test_df = pd.DataFrame( 598 { 599 "TestId": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], 600 "DaysInMilk": [10, 20, 30, 40, 50, 15, 25, 35, 45, 55], 601 "MilkingYield": [30, 35, 40, 38, 36, 28, 33, 37, 39, 34], 602 } 603 ) 604 605 result_cov = best_predict_method( 606 test_df, standard_lc=STANDARD_CURVE, covariance_matrix=COV_MATRIX 607 ) 608 609 print("Predictions with provided covariance matrix:") 610 print(result_cov) 611 612 613def _set_doc_signatures() -> None: 614 """Override displayed defaults in docs without changing runtime behavior.""" 615 doc_defaults = { 616 "standard_lc": _DOC_STANDARD_CURVE, 617 "covariance_matrix": _DOC_COV_MATRIX, 618 } 619 620 for func in (best_predict_method_single_lac, best_predict_method): 621 signature = inspect.signature(func) 622 params = [ 623 param.replace(default=doc_defaults[param.name]) if param.name in doc_defaults else param 624 for param in signature.parameters.values() 625 ] 626 func.__signature__ = signature.replace(parameters=params) 627 628 629_set_doc_signatures() 630 631if __name__ == "__main__": 632 demo()
183def pivot_milk_recordings_to_matrix(df: pd.DataFrame) -> np.ndarray: 184 """Convert long-format recordings to a fixed 305-day matrix. 185 186 Rows represent lactations (``TestId``) and columns represent days in milk 187 from 1 through 305. Missing observations are kept as ``NaN``. 188 189 Args: 190 df: Dataframe with ``TestId``, ``DaysInMilk``, and ``MilkingYield``. 191 192 Returns: 193 A NumPy array of shape ``(n_lactations, 305)``. 194 """ 195 # ensure sorting 196 df = df.sort_values(["TestId", "DaysInMilk"]) 197 198 # pivot to wide format 199 milk_recordings_pivot = df.pivot_table( 200 index="TestId", columns="DaysInMilk", values="MilkingYield" 201 ) 202 203 # enforce fixed 305-day grid alignment used by best-prediction 204 milk_recordings_pivot = milk_recordings_pivot.reindex(columns=range(1, 306)) 205 206 # convert to numpy matrix 207 Y = milk_recordings_pivot.to_numpy() 208 return Y
Convert long-format recordings to a fixed 305-day matrix.
Rows represent lactations (TestId) and columns represent days in milk
from 1 through 305. Missing observations are kept as NaN.
Arguments:
- df: Dataframe with
TestId,DaysInMilk, andMilkingYield.
Returns:
A NumPy array of shape
(n_lactations, 305).
211def fit_standard_lc(df: pd.DataFrame, lc_model: str = "Wood") -> np.ndarray: 212 """Fit a population-level standard lactation curve. 213 214 The curve is fit with the package's frequentist Wood model and returned on 215 the fixed day-1..305 grid. 216 217 Args: 218 df: Reference dataframe containing ``DaysInMilk`` and ``MilkingYield``. 219 lc_model: The lactation curve model to fit. 220 Default is the "Wood" lactation curve model. 221 222 Returns: 223 A NumPy array of expected daily milk yield for days 1..305. 224 225 Notes: 226 This mean curve acts as the baseline in best prediction. Individual 227 lactations are represented as deviations around this population profile. 228 """ 229 standard_lc = pd.Series( 230 fit_lactation_curve( 231 df["DaysInMilk"].values, 232 df["MilkingYield"].values, 233 model=lc_model, 234 fitting="frequentist", 235 ), 236 index=range(1, 306), 237 ) 238 239 return standard_lc.to_numpy(dtype=float)
Fit a population-level standard lactation curve.
The curve is fit with the package's frequentist Wood model and returned on the fixed day-1..305 grid.
Arguments:
- df: Reference dataframe containing
DaysInMilkandMilkingYield. - lc_model: The lactation curve model to fit.
- Default is the "Wood" lactation curve model.
Returns:
A NumPy array of expected daily milk yield for days 1..305.
Notes:
This mean curve acts as the baseline in best prediction. Individual lactations are represented as deviations around this population profile.
242def center_lactation_data( 243 milk_matrix: np.ndarray, 244 standard_lc: np.ndarray, 245 day_mean_method: str = "standard_lc", 246) -> np.ndarray: 247 """Center lactation yields before covariance estimation. 248 249 Args: 250 milk_matrix: Yield matrix with lactations in rows and days in columns. 251 standard_lc: Expected day-wise milk yield profile. 252 day_mean_method: Mean-centering strategy. Supported values are 253 ``"standard_lc"`` (default) and ``"data"``. 254 255 Returns: 256 A centered matrix with the same shape as ``milk_matrix``. 257 258 Raises: 259 ValueError: If ``day_mean_method`` is not supported. 260 """ 261 if day_mean_method == "standard_lc": 262 day_mean = standard_lc 263 elif day_mean_method == "data": 264 day_mean = np.nanmean(milk_matrix, axis=0) 265 else: 266 raise ValueError("day_mean_method must be 'standard_lc' or 'data'.") 267 268 return milk_matrix - day_mean
Center lactation yields before covariance estimation.
Arguments:
- milk_matrix: Yield matrix with lactations in rows and days in columns.
- standard_lc: Expected day-wise milk yield profile.
- day_mean_method: Mean-centering strategy. Supported values are
"standard_lc"(default) and"data".
Returns:
A centered matrix with the same shape as
milk_matrix.
Raises:
- ValueError: If
day_mean_methodis not supported.
271def build_covariance_matrix(rho: float, size: int) -> np.ndarray: 272 """Construct a covariance matrix. 273 274 Cole et al. (2007) estimated correlations among test-day yields using a 275 simplified model with an identity matrix (I) for daily measurement error 276 and an autoregressive matrix (E) for biological change. E is defined as 277 ``Eij = r ** |i-j|`` where ``i`` and ``j`` are test-day DIM and 278 ``0 < r < 1``. 279 280 Element ``(i, j)`` is ``rho ** abs(i - j)``. 281 282 Args: 283 rho: AR(1) correlation parameter. 284 size: Matrix dimension. 285 286 Returns: 287 A ``(size, size)`` AR(1) correlation matrix. 288 """ 289 idx = np.arange(size) 290 M = np.abs(idx[:, None] - idx[None, :]) 291 return rho**M
Construct a covariance matrix.
Cole et al. (2007) estimated correlations among test-day yields using a
simplified model with an identity matrix (I) for daily measurement error
and an autoregressive matrix (E) for biological change. E is defined as
Eij = r ** |i-j| where i and j are test-day DIM and
0 < r < 1.
Element (i, j) is rho ** abs(i - j).
Arguments:
- rho: AR(1) correlation parameter.
- size: Matrix dimension.
Returns:
A
(size, size)AR(1) correlation matrix.
294def fit_autocorrelation_matrix( 295 df: pd.DataFrame, standard_lc: np.ndarray 296) -> dict[str, np.ndarray | float]: 297 """Estimate covariance parameters for best prediction. 298 299 The model is ``B = b1 * I + b2 * E`` where ``E`` is an AR(1) correlation 300 matrix. Parameters are optimized in transformed space and mapped back to 301 enforce ``b1 > 0``, ``b2 > 0``, and ``0 < rho < 1``. 302 303 Args: 304 df: Reference milk-recording dataframe. 305 standard_lc: Population mean curve used for centering. 306 307 Returns: 308 Dictionary with: 309 - ``"B_hat"``: fitted covariance matrix. 310 - ``"R_hat"``: correlation matrix derived from ``B_hat``. 311 - ``"b1"``, ``"b2"``, ``"rho"``: fitted scalar parameters. 312 """ 313 milk_matrix = pivot_milk_recordings_to_matrix(df) 314 centered_matrix = center_lactation_data(milk_matrix, standard_lc) 315 n_lactations, n_days = centered_matrix.shape 316 observed_indices = [np.where(~np.isnan(centered_matrix[i]))[0] for i in range(n_lactations)] 317 318 def negative_log_likelihood(params: np.ndarray) -> float: 319 p_b1, p_b2, p_rho = params 320 b1 = float(np.exp(p_b1)) 321 b2 = float(np.exp(p_b2)) 322 rho = float(1 / (1 + np.exp(-p_rho))) # now rho in (0,1) 323 correlation_matrix = build_covariance_matrix(rho, n_days) 324 325 total = 0.0 326 for lactation_idx, day_indices in enumerate(observed_indices): 327 observation_count = len(day_indices) 328 if observation_count == 0: 329 continue 330 331 observations = centered_matrix[lactation_idx, day_indices] 332 correlation_subset = correlation_matrix[np.ix_(day_indices, day_indices)] 333 sigma = b1 * np.eye(observation_count) + b2 * correlation_subset 334 335 # Numerical safeguards: try Cholesky and penalize non-PD parameters. 336 try: 337 cholesky_factor, lower = cho_factor(sigma, check_finite=False) 338 solution = cho_solve((cholesky_factor, lower), observations, check_finite=False) 339 except LinAlgError: 340 # penalty for non-PD 341 return float(1e12 + np.sum(np.abs(params))) 342 343 quadratic_form = float(observations @ solution) 344 log_determinant = 2.0 * np.sum(np.log(np.diag(cholesky_factor))) 345 total += 0.5 * ( 346 log_determinant + quadratic_form + observation_count * np.log(2 * np.pi) 347 ) 348 349 # return total negative log-likelihood 350 return float(total) 351 352 # initial guesses and optimization. A 50/50 split in variance is assumed as starting point 353 initial_variance = max(float(np.nanvar(centered_matrix)), 1e-6) 354 initial_params = [ 355 np.log(0.5 * initial_variance), 356 np.log(0.5 * initial_variance), 357 0.5, 358 ] 359 360 result = minimize( 361 negative_log_likelihood, 362 x0=initial_params, 363 method="L-BFGS-B", 364 options={"maxiter": 2000, "ftol": 1e-8}, 365 ) 366 367 if not result.success: 368 print(f"Optimization warning: {result.message}") 369 370 log_b1_hat, log_b2_hat, logit_rho_hat = result.x 371 b1_hat = float(np.exp(log_b1_hat)) 372 b2_hat = float(np.exp(log_b2_hat)) 373 rho_hat = float(1 / (1 + np.exp(-logit_rho_hat))) 374 correlation_matrix = build_covariance_matrix(rho_hat, n_days) 375 covariance_matrix = b1_hat * np.eye(n_days) + b2_hat * correlation_matrix 376 377 # convert to correlation matrix 378 std = np.sqrt(np.diag(covariance_matrix)) 379 correlation_matrix = covariance_matrix / np.outer(std, std) 380 381 return { 382 "B_hat": covariance_matrix, 383 "R_hat": correlation_matrix, 384 "b1": b1_hat, 385 "b2": b2_hat, 386 "rho": rho_hat, 387 }
Estimate covariance parameters for best prediction.
The model is B = b1 * I + b2 * E where E is an AR(1) correlation
matrix. Parameters are optimized in transformed space and mapped back to
enforce b1 > 0, b2 > 0, and 0 < rho < 1.
Arguments:
- df: Reference milk-recording dataframe.
- standard_lc: Population mean curve used for centering.
Returns:
Dictionary with:
"B_hat": fitted covariance matrix."R_hat": correlation matrix derived fromB_hat."b1","b2","rho": fitted scalar parameters.
393def preprocess_measured_data(lactation: pd.DataFrame, standard_lc: np.ndarray) -> pd.Series: 394 """Build a 305-day deviation vector for a single lactation. 395 396 For observed days, this computes ``MilkingYield - standard_lc[day]``. 397 The result is reindexed to days 1..305 with unobserved days filled as zero. 398 399 Args: 400 lactation: Single-lactation dataframe with ``DaysInMilk`` and 401 ``MilkingYield``. 402 standard_lc: Expected daily milk yield profile. 403 404 Returns: 405 A Series indexed by day 1..305 containing milk-yield deviations. 406 """ 407 408 # calculate the difference between the expected (population mean) and measured milk yield 409 410 # extract the expected milk yields for the measured DaysInMilk in the df 411 day_idx = lactation["DaysInMilk"].to_numpy(dtype=int) - 1 412 expected = np.asarray(standard_lc, dtype=float)[day_idx] 413 414 # Subtract 415 lactation["MilkDifference"] = lactation["MilkingYield"].to_numpy(dtype=float) - expected 416 417 # Create a Series of length 305 with missing values = 0 418 milk_difference = cast(pd.Series, lactation.set_index("DaysInMilk")["MilkDifference"]) 419 corrected_series = milk_difference.reindex(range(1, 306), fill_value=0) 420 421 return corrected_series
Build a 305-day deviation vector for a single lactation.
For observed days, this computes MilkingYield - standard_lc[day].
The result is reindexed to days 1..305 with unobserved days filled as zero.
Arguments:
- lactation: Single-lactation dataframe with
DaysInMilkandMilkingYield. - standard_lc: Expected daily milk yield profile.
Returns:
A Series indexed by day 1..305 containing milk-yield deviations.
424def best_predict_method_single_lac( 425 lactation: pd.DataFrame, 426 standard_lc: np.ndarray = STANDARD_CURVE, 427 covariance_matrix: np.ndarray = COV_MATRIX, 428) -> float: 429 """Predict 305-day cumulative yield for one lactation. 430 431 Observed test-day deviations are projected over all 305 days using the 432 covariance structure and then added to the baseline cumulative standard 433 curve. 434 435 By default this function uses the package-provided standard curve and covariance matrix. 436 But it is also possible to provide your own standard curve and covariance matrix, 437 for example when you want to fit these ingredients from your own reference population. 438 439 440 Args: 441 lactation: Observed records for one lactation. 442 standard_lc: Population mean daily yield profile. 443 covariance_matrix: Day-to-day covariance matrix on the 305-day grid. 444 445 Returns: 446 Predicted cumulative 305-day milk yield. 447 448 Notes: 449 Duplicate day records are resolved with ``keep="last"`` before 450 prediction. If no valid observations remain in days 1..305, the method 451 returns the cumulative standard curve. 452 """ 453 filtered_lactation = lactation.loc[ 454 (lactation["DaysInMilk"] >= 1) & (lactation["DaysInMilk"] <= 305) 455 ].copy() 456 filtered_lactation = filtered_lactation.drop_duplicates(subset=["DaysInMilk"], keep="last") 457 filtered_lactation = filtered_lactation.sort_values("DaysInMilk") 458 459 corrected_series = preprocess_measured_data( 460 filtered_lactation, 461 standard_lc=standard_lc, 462 ) 463 464 if filtered_lactation.empty: 465 return float(np.sum(standard_lc)) 466 467 obs_idx_1based = filtered_lactation["DaysInMilk"].to_numpy(dtype=int) # DaysInMilk: 1-305 468 obs_idx_0based = obs_idx_1based - 1 # Convert to 0-based matrix indices: 0-304 469 y_obs = corrected_series.loc[obs_idx_1based].to_numpy( 470 dtype=float 471 ) # corrected_series is indexed by DaysInMilk (1-305) 472 473 # Extract covariance blocks 474 B_oo = covariance_matrix[ 475 np.ix_(obs_idx_0based, obs_idx_0based) 476 ] # Use 0-based indices for matrix 477 B_mo = covariance_matrix[:, obs_idx_0based] # Use 0-based indices for matrix 478 479 # solve 480 c, lower = cho_factor(B_oo) 481 alpha = cho_solve((c, lower), y_obs) 482 483 # Predict full deviation curve 484 y_estimate = B_mo @ alpha 485 486 # Total milk = baseline + deviation 487 deviation = np.sum(y_estimate) 488 489 total = np.sum(standard_lc) + deviation 490 491 return total
Predict 305-day cumulative yield for one lactation.
Observed test-day deviations are projected over all 305 days using the covariance structure and then added to the baseline cumulative standard curve.
By default this function uses the package-provided standard curve and covariance matrix. But it is also possible to provide your own standard curve and covariance matrix, for example when you want to fit these ingredients from your own reference population.
Arguments:
- lactation: Observed records for one lactation.
- standard_lc: Population mean daily yield profile.
- covariance_matrix: Day-to-day covariance matrix on the 305-day grid.
Returns:
Predicted cumulative 305-day milk yield.
Notes:
Duplicate day records are resolved with
keep="last"before prediction. If no valid observations remain in days 1..305, the method returns the cumulative standard curve.
494def best_predict_method( 495 df: pd.DataFrame, 496 standard_lc: np.ndarray = STANDARD_CURVE, 497 days_in_milk_col: str | None = None, 498 milking_yield_col: str | None = None, 499 test_id_col: str | None = None, 500 default_test_id: int = 0, 501 covariance_matrix: np.ndarray | None = COV_MATRIX, 502 fit_standard_lc_from_data: bool = False, 503 reference_df: pd.DataFrame | None = None, 504) -> pd.DataFrame: 505 """Apply best prediction to one or more lactations. 506 507 By default this function uses the package-provided standard curve and covariance matrix. 508 But it is also possible to provide your own standard curve and covariance matrix, 509 for example when you want to fit these ingredients from your own reference population. 510 This can be done in two ways: either by fitting the covariance matrix and standard curve 511 directly from a reference dataset by providing a pandas dataframe at 'reference_df =' 512 when ``fit_standard_lc_from_data`` is True. 513 Alternative for customization is to set standard_lc_305 and covariance_matrix 514 directly in the function call. 515 516 Args: 517 df: Input observations. If ``TestId`` is missing, all rows are treated 518 as one lactation. 519 standard_lc: Expected daily milk yield lactation curve on days 1..305. 520 If not provided, the package's default curve is used. 521 Or fit your own standard curve from a reference dataset by 522 providing a pandas dataframe at 'reference_df =' 523 when ``fit_standard_lc_from_data`` is True. 524 days_in_milk_col: Optional input column name for days in milk. If 525 provided, it is mapped to ``DaysInMilk``. 526 milking_yield_col: Optional input column name for milk yield. If 527 provided, it is mapped to ``MilkingYield``. 528 test_id_col: Optional input column name for lactation/test identifier. 529 If provided, it is mapped to ``TestId``. 530 default_test_id: Fallback test id used when no test-id column is 531 available. 532 covariance_matrix: Optional prefit covariance matrix. If omitted, 533 the default matrix is used or 534 ``reference_df`` can be used to fit one for your own data. 535 fit_standard_lc_from_data: Whether to fit covariance information from 536 ``reference_df`` instead of using a provided covariance matrix. 537 reference_df: Reference dataframe used when ``covariance_matrix`` and 538 ``standard_lc`` are not provided and ``fit_standard_lc_from_data`` 539 is True. 540 541 Returns: 542 Dataframe with columns ``TestId`` and ``LactationMilkYield``. 543 544 Raises: 545 ValueError: If neither ``covariance_matrix`` nor ``reference_df`` is 546 provided. 547 """ 548 # Standardize columns and filter DIM <= 305 549 df = standardize_lactation_columns( 550 df, 551 days_in_milk_col=days_in_milk_col, 552 milking_yield_col=milking_yield_col, 553 test_id_col=test_id_col, 554 default_test_id=default_test_id, 555 max_dim=305, 556 ) 557 558 # Fit covariance if not provided 559 if fit_standard_lc_from_data: 560 if reference_df is None: 561 raise ValueError("Provide reference_df to fit your own standard lactation curve.") 562 reference_df = standardize_lactation_columns( 563 reference_df, 564 days_in_milk_col=days_in_milk_col, 565 milking_yield_col=milking_yield_col, 566 test_id_col=test_id_col, 567 default_test_id=default_test_id, 568 max_dim=305, 569 ) 570 covariance_matrix = cast( 571 np.ndarray, fit_autocorrelation_matrix(reference_df, standard_lc)["B_hat"] 572 ) 573 574 covariance_matrix_array = cast(np.ndarray, covariance_matrix) 575 576 df = df.copy() 577 578 results = [] 579 580 for test_id, lactation in df.groupby("TestId"): 581 pred = best_predict_method_single_lac( 582 lactation, 583 standard_lc, 584 covariance_matrix_array, 585 ) 586 results.append({"TestId": test_id, "LactationMilkYield": pred}) 587 588 return pd.DataFrame(results)
Apply best prediction to one or more lactations.
By default this function uses the package-provided standard curve and covariance matrix.
But it is also possible to provide your own standard curve and covariance matrix,
for example when you want to fit these ingredients from your own reference population.
This can be done in two ways: either by fitting the covariance matrix and standard curve
directly from a reference dataset by providing a pandas dataframe at 'reference_df ='
when fit_standard_lc_from_data is True.
Alternative for customization is to set standard_lc_305 and covariance_matrix
directly in the function call.
Arguments:
- df: Input observations. If
TestIdis missing, all rows are treated as one lactation. - standard_lc: Expected daily milk yield lactation curve on days 1..305.
If not provided, the package's default curve is used.
Or fit your own standard curve from a reference dataset by
providing a pandas dataframe at 'reference_df ='
when
fit_standard_lc_from_datais True. - days_in_milk_col: Optional input column name for days in milk. If
provided, it is mapped to
DaysInMilk. - milking_yield_col: Optional input column name for milk yield. If
provided, it is mapped to
MilkingYield. - test_id_col: Optional input column name for lactation/test identifier.
If provided, it is mapped to
TestId. - default_test_id: Fallback test id used when no test-id column is available.
- covariance_matrix: Optional prefit covariance matrix. If omitted,
the default matrix is used or
reference_dfcan be used to fit one for your own data. - fit_standard_lc_from_data: Whether to fit covariance information from
reference_dfinstead of using a provided covariance matrix. - reference_df: Reference dataframe used when
covariance_matrixandstandard_lcare not provided andfit_standard_lc_from_datais True.
Returns:
Dataframe with columns
TestIdandLactationMilkYield.
Raises:
- ValueError: If neither
covariance_matrixnorreference_dfis provided.
594def demo() -> None: 595 """Run a minimal example of best prediction with mock data.""" 596 597 # --- Single + multiple lactations example --- 598 test_df = pd.DataFrame( 599 { 600 "TestId": [1, 1, 1, 1, 1, 2, 2, 2, 2, 2], 601 "DaysInMilk": [10, 20, 30, 40, 50, 15, 25, 35, 45, 55], 602 "MilkingYield": [30, 35, 40, 38, 36, 28, 33, 37, 39, 34], 603 } 604 ) 605 606 result_cov = best_predict_method( 607 test_df, standard_lc=STANDARD_CURVE, covariance_matrix=COV_MATRIX 608 ) 609 610 print("Predictions with provided covariance matrix:") 611 print(result_cov)
Run a minimal example of best prediction with mock data.