lactationcurve.characteristics.ISLC
Interpolation using Standard Lactation Curves (ISLC).
Purpose
The Interpolation using Standard Lactation Curves (ISLC) method estimates lactation yield from intermittent test-day records by predicting deviations from an expected lactation curve. The method uses standard lactation curves that represent the expected course of lactation for specific cow subgroups, such as breed or parity.
By incorporating these standard curves, ISLC accounts for the typical lactation pattern in which milk yield increases after calving, reaches a peak, and subsequently declines. Daily yields are estimated at fixed lactation days (e.g., 0, 10, 30, 50) and used to calculate accumulated lactation production.
This module implements the ISLC family of methods following the CRV/ICAR approach described in ICAR Procedure 2, Section 2 (Computing Accumulated Lactation Yield). The implementation is based on Wilmink-type standard lactation curves combined with correlation-based deviation corrections.
Method Summary
ISLC methods estimate total lactation yield by first interpolating test-day yields onto a predefined DIM grid, then integrating the resulting days with a test-interval style summation.
Key Entry Points
ISLC_method: compute a single lactation's estimated 305-d yield from interpolated grid measurements.ISLC: applyISLC_methodperTestIdin a pandas DataFrame.ISLC_original_method: compute a single lactation's estimated 305-d yield using the technique described in the original paper. Wilmink et al. (1987) "Comparison of different methods of predicting 305-day milk yield using means calculated from within-herd lactation curves"ISLC_original: applyISLC_original_methodperTestIdin a pandas DataFrame.interpolation_standard_lc: interpolate measured test-day yields onto the DIM grid using the standard lactation curve to guide slopes.linear_interpd_all_to_gridandlinear_interpd_closest_to_grid: Linear interpolation alternatives.
Column Flexibility
The functions accept several case-insensitive column name aliases and can
create a default TestId if one is missing. Recognized aliases:
- Days in Milk:
["daysinmilk", "dim", "testday"] - Milk Yield:
["milkingyield", "testdaymilkyield", "milkyield", "yield"] - Test Id:
["animalid", "testid", "id"]
It is also possible to provide your own column names so the function can be applied to dataframes with different column naming conventions.
Returns a DataFrame with columns: ["TestId", "LactationMilkYield"].
Defaults
STANDARD_CURVE: Baseline expected lactation curve for days 1..305 (Wilmink lactation curve model).CORR_MATRIX: Default day-to-day correlation structure used for projection.STDs: Standard deviations for each grid day, empirically estimated.
Notes
- Milk samples are often recorded on non-grid DIM values. Interpolation is therefore used to map measurements to grid days.
- Three interpolation variants are implemented.
interpolation_standard_lcis recommended because it is the most consistent with the original method. interpolation_standard_lcrequires a standard lactation curve. You can fit one with thelactationcurvepackage or use the default lactation curve.- The methods can be applied to lactations without any measurements, in which case the result will be the population mean from the standard curve.
- Predicted milk yields have less variance than true milk yields. With TIM, estimated yields have more variance than true yields. The reason is that predicted yields are regressed toward the mean unless all 305 daily yields are observed.
- The main ISLC method in this package is not the same as the original method
described in the paper by Wilmink.
The original method is implemented in
ISLC_original_methodandISLC_original. The difference in the new method is that the grid is extended to include day 1 and day 305. ALso the measured test days are taken into account in the final summation to lactation yield, while in the original method only the interpolated values on the grid are used. The new method can also be applied to lactation extending beyond 305 days, while the original method is strictly for 305-day yield estimation. - The method currently assumes that the standard curves are the same for all lactations, but future updates may allow using different standard curves and covariance structures for different subgroups of lactations (e.g., by breed or parity).
- ISLC's strenght is that its caclulation is broken up in multiple steps, which makes it more computer memory friendly than the best predict method, which requires the inversion of a large covariance matrix. Because of the integration of standard lactation curves it takes into account the shape of the lactation curve, which can lead to more accurate estimates of total yield, especially for lactations with few test days or irregular patterns.
- ISLC's main disadvantage is that because of the many in between steps, it is relatively complex to implement and understand, making it less transparant. The best results are obtained when the standard curve and covariance matrix are from the same population as the data, which can be a barrier for users without access to a large reference dataset. This also causes inconsistencies in cumulative milk yield results depending on which standard curves are used. A cow with the exact same test-day records can have a different cumulative milk yield estimates depending on the standard curve used, which can be considered unfair.
References
Wilmink, J. B. M. (1987). Comparison of different methods of predicting 305-day milk yield using means calculated from within-herd lactation curves. Livestock Production Science, 17, 1-17.
Wilmink, J. B. M. (1987). Adjustment of test-day milk, fat and protein yield for age, season and stage of lactation. Livestock Production Science, 16(4), 335-348.
Handbook NRS © CR Delta chapter E1 and E2 02-98
Author: Meike van Leerdam Date: 20 October 2025 Last update: 21 May 2026
1"""Interpolation using Standard Lactation Curves (ISLC). 2 3Purpose 4------- 5The Interpolation using Standard Lactation Curves (ISLC) method estimates 6lactation yield from intermittent test-day records by predicting 7deviations from an expected lactation curve. 8The method uses standard lactation curves that represent the expected 9course of lactation for specific cow subgroups, such 10as breed or parity. 11 12By incorporating these standard curves, ISLC accounts for the typical 13lactation pattern in which milk yield increases after calving, reaches a 14peak, and subsequently declines. Daily yields are estimated at fixed 15lactation days (e.g., 0, 10, 30, 50) and used to calculate accumulated 16lactation production. 17 18This module implements the ISLC family of methods following the CRV/ICAR 19approach described in ICAR Procedure 2, Section 2 (Computing Accumulated 20Lactation Yield). The implementation is based on Wilmink-type standard 21lactation curves combined with correlation-based deviation corrections. 22 23 24Method Summary 25-------------- 26ISLC methods estimate total lactation yield by first interpolating test-day 27yields onto a predefined DIM grid, then integrating the resulting days with 28a test-interval style summation. 29 30Key Entry Points 31---------------- 32- ``ISLC_method``: compute a single lactation's estimated 305-d yield from 33 interpolated grid measurements. 34- ``ISLC``: apply ``ISLC_method`` per ``TestId`` in a pandas DataFrame. 35- ``ISLC_original_method``: compute a single lactation's estimated 36 305-d yield using the technique described in the original paper. 37 Wilmink et al. (1987) 38 "Comparison of different methods of predicting 305-day milk yield 39 using means calculated from within-herd lactation curves" 40- ``ISLC_original``: apply ``ISLC_original_method`` per ``TestId`` 41 in a pandas DataFrame. 42- ``interpolation_standard_lc``: interpolate measured test-day yields onto 43 the DIM grid using the standard lactation curve to guide slopes. 44- ``linear_interpd_all_to_grid`` and ``linear_interpd_closest_to_grid``: 45 Linear interpolation alternatives. 46 47Column Flexibility 48------------------ 49The functions accept several case-insensitive column name aliases and can 50create a default ``TestId`` if one is missing. Recognized aliases: 51 52- Days in Milk: `["daysinmilk", "dim", "testday"]` 53- Milk Yield: `["milkingyield", "testdaymilkyield", "milkyield", "yield"]` 54- Test Id: `["animalid", "testid", "id"]` 55 56It is also possible to provide your own column names so the function 57can be applied to dataframes with different column naming conventions. 58 59Returns a DataFrame with columns: ``["TestId", "LactationMilkYield"]``. 60 61Defaults 62-------- 63- ``STANDARD_CURVE``: Baseline expected lactation curve for days 1..305 64 (Wilmink lactation curve model). 65- ``CORR_MATRIX``: Default day-to-day correlation structure used for projection. 66- ``STDs``: Standard deviations for each grid day, empirically estimated. 67 68Notes 69----- 70- Milk samples are often recorded on non-grid DIM values. 71 Interpolation is therefore used to map measurements to grid days. 72- Three interpolation variants are implemented. 73 ``interpolation_standard_lc`` is recommended because it is the most 74 consistent with the original method. 75- ``interpolation_standard_lc`` requires a standard lactation curve. 76 You can fit one with the ``lactationcurve`` package 77 or use the default lactation curve. 78- The methods can be applied to lactations without any measurements, 79 in which case the result will be the population mean from the standard curve. 80- Predicted milk yields have less variance than true milk yields. With TIM, 81 estimated yields have more variance than true yields. The reason is that predicted yields are 82 regressed toward the mean unless all 305 daily yields are observed. 83- The main ISLC method in this package is not the same as the original method 84 described in the paper by Wilmink. 85 The original method is implemented in ``ISLC_original_method`` and ``ISLC_original``. 86 The difference in the new method is that the grid is extended to include day 1 and day 305. 87 ALso the measured test days are taken into account in the final summation to lactation yield, 88 while in the original method only the interpolated values on the grid are used. 89 The new method can also be applied to lactation extending beyond 305 days, 90 while the original method is strictly for 305-day yield estimation. 91- The method currently assumes that the standard curves are the same 92 for all lactations, 93 but future updates may allow using different standard curves and covariance structures for 94 different subgroups of lactations (e.g., by breed or parity). 95- ISLC's strenght is that its caclulation is broken up in multiple steps, 96 which makes it more computer memory friendly than the best predict method, 97 which requires the inversion of a large covariance matrix. 98 Because of the integration of standard lactation curves 99 it takes into account the shape of the lactation curve, 100 which can lead to more accurate estimates of total yield, 101 especially for lactations with few test days or irregular patterns. 102- ISLC's main disadvantage is that because of the many in between steps, 103 it is relatively complex to implement and understand, making it less transparant. 104 The best results are obtained when the standard curve and covariance matrix 105 are from the same population as the data, 106 which can be a barrier for users without access to a large reference dataset. 107 This also causes inconsistencies in cumulative milk yield results 108 depending on which standard curves are used. 109 A cow with the exact same test-day records 110 can have a different cumulative milk yield estimates depending 111 on the standard curve used, which can be considered unfair. 112 113References 114--------- 115Wilmink, J. B. M. (1987). 116Comparison of different methods of predicting 305-day milk yield using means 117calculated from within-herd lactation curves. Livestock Production Science, 11817, 1-17. 119 120Wilmink, J. B. M. (1987). 121Adjustment of test-day milk, fat and protein yield for age, 122season and stage of lactation. Livestock Production Science, 16(4), 335-348. 123 124Handbook NRS © CR Delta chapter E1 and E2 02-98 125 126 127Author: Meike van Leerdam 128Date: 20 October 2025 129Last update: 21 May 2026 130""" 131 132# Import packages 133import inspect 134from pathlib import Path 135from typing import Any, Protocol, cast 136 137import numpy as np 138import numpy.typing as npt 139import pandas as pd 140from scipy.interpolate import interp1d 141 142from lactationcurve.fitting import fit_lactation_curve 143from lactationcurve.preprocessing import standardize_lactation_columns 144 145DATA_DIR = Path(__file__).resolve().parent / "data" 146GRID_DAYS = [10, 30, 50, 70, 90, 110, 130, 150, 170, 190, 210, 230, 250, 270, 290] 147 148# get the standard lactation curve ingredients back from the data storage: 149CORR_MATRIX = cast(pd.DataFrame, pd.read_pickle(DATA_DIR / "corr_matrix.pkl")) 150STDs = np.load(DATA_DIR / "std_per_grid_day.npy") 151STANDARD_CURVE = np.load(DATA_DIR / "standard_lc_grid.npy") 152 153# add a day zero (that equals day 1) to the standard curve to prevent an index shift 154STANDARD_CURVE = np.insert(STANDARD_CURVE, 0, STANDARD_CURVE[0]) 155 156 157class _DocDefault: 158 """Lightweight repr-only wrapper for cleaner generated signatures.""" 159 160 def __init__(self, label: str) -> None: 161 self.label = label 162 163 def __repr__(self) -> str: 164 return self.label 165 166 167_DOC_STANDARD_CURVE = _DocDefault("STANDARD_CURVE") 168_DOC_CORR_MATRIX = _DocDefault("CORR_MATRIX") 169_DOC_STDS = _DocDefault("STDs") 170 171 172def _curve_to_series(curve: pd.Series | npt.NDArray[np.float64]) -> pd.Series: 173 """Normalize standard lactation curve input to a Series.""" 174 if isinstance(curve, pd.Series): 175 return curve 176 return pd.Series(curve) 177 178 179def _curve_value(curve: pd.Series, day: int) -> float: 180 """Return curve value for a DIM, supporting both label and positional indexing.""" 181 if day in curve.index: 182 return float(curve.loc[day]) 183 return float(curve.iloc[day]) 184 185 186class InterpolationMethod(Protocol): 187 """Protocol for milk-yield interpolation methods. 188 189 Defines the signature expected by functions that interpolate test-day 190 yields onto a predefined DIM grid. 191 """ 192 193 def __call__( 194 self, 195 group: pd.DataFrame, 196 days_in_milk_col: str, 197 milking_yield_col: str, 198 standard_lc: pd.Series | None = None, 199 small_grid: bool = False, 200 ) -> pd.DataFrame: 201 """Interpolate yields for a single lactation onto the DIM grid. 202 203 Args: 204 group: Test-day records for a single animal. 205 days_in_milk_col: Column name containing days in milk (DIM). 206 milking_yield_col: Column name containing milk yields. 207 standard_lc: Optional standard lactation curve to guide 208 interpolation (method-dependent). 209 small_grid: If True, interpolate only on GRID_DAYS; 210 otherwise use [0] + GRID_DAYS + [305]. 211 212 Returns: 213 A DataFrame with interpolated yields, typically including 214 columns ``GridDay`` and ``MilkYieldInterp``. 215 """ 216 ... 217 218 219def ISLC( 220 df: pd.DataFrame, 221 days_in_milk_col: str | None = None, 222 milking_yield_col: str | None = None, 223 test_id_col: str | None = None, 224 default_test_id: int = 0, 225 standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE, 226 correlation_matrix: pd.DataFrame = CORR_MATRIX, 227 std_per_grid_day: npt.NDArray[np.float64] = STDs, 228 max_dim: int | str = 305, 229 fit_standard_lc_from_data: bool = False, 230 reference_df: pd.DataFrame | None = None, 231) -> pd.DataFrame: 232 """Estimate cumulative lactation yield per TestId using ISLC. 233 234 This variant applies :func:`ISLC_method` to standardized lactation records. 235 By default it uses the package-provided standard curve, correlation matrix, 236 and per-grid day standard deviations. Optionally, these assets can be refit 237 from a reference dataset by providing a pandas dataframe at 'reference_df =' 238 when ``fit_standard_lc_from_data`` is True. 239 Alternative for customization is to set standard_lc_305, 240 correlation_matrix, and std_per_grid_day directly. 241 242 Args: 243 df: Input test-day records. 244 days_in_milk_col: Name of DIM column in ``df``. 245 milking_yield_col: Name of milk-yield column in ``df``. 246 test_id_col: Name of lactation identifier column in ``df``. 247 default_test_id: Fallback TestId value when ``test_id_col`` is absent. 248 standard_lc_305: Standard lactation curve values used by ISLC. 249 correlation_matrix: Correlation matrix aligned with ISLC grid. 250 std_per_grid_day: Standard deviations aligned with ISLC grid. 251 max_dim: Maximum DIM to include 252 (or ``"max"`` so highest test day DIM is considered the final lactation day). 253 fit_standard_lc_from_data: If True, fit representation from ``reference_df``. 254 reference_df: Reference records used when fitting curve and covariance. 255 256 Returns: 257 DataFrame with columns ``TestId`` and ``LactationMilkYield``. 258 259 Raises: 260 ValueError: If required columns are missing or fit mode has no reference data. 261 """ 262 # Standardize columns and filter DIM <= 305 263 df = standardize_lactation_columns( 264 df, 265 days_in_milk_col=days_in_milk_col, 266 milking_yield_col=milking_yield_col, 267 test_id_col=test_id_col, 268 default_test_id=default_test_id, 269 max_dim=max_dim, 270 ) 271 272 # Fit covariance/standard_lc if requested 273 if fit_standard_lc_from_data is True: 274 if reference_df is None: 275 raise ValueError("Provide reference_df to fit your own standard lactation curve.") 276 reference_df = standardize_lactation_columns( 277 reference_df, 278 days_in_milk_col=days_in_milk_col, 279 milking_yield_col=milking_yield_col, 280 test_id_col=test_id_col, 281 default_test_id=default_test_id, 282 max_dim=305, 283 ) 284 dim = reference_df["DaysInMilk"] 285 milk_yield = reference_df["MilkingYield"] 286 standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink") 287 # convert to pandas series 288 standard_full_lc_305 = pd.Series(standard_full_lc_305) 289 290 # fit matrix 291 ( 292 correlation_matrix, 293 std_per_grid_day, 294 standard_curve_fitted, 295 ) = create_standard_lc_representation( 296 df=reference_df, 297 standard_lactation_curve=pd.Series(standard_full_lc_305), 298 # reference_df is standardized above, so fixed names are required here 299 days_in_milk_col="DaysInMilk", 300 milking_yield_col="MilkingYield", 301 interpolation_method=interpolation_standard_lc, 302 small_grid=False, 303 ) 304 305 # Align fitted curve with default internal convention (day 0 equals day 1). 306 standard_lc_305 = np.insert( 307 np.asarray(standard_curve_fitted, dtype=float), 308 0, 309 float(standard_curve_fitted.iloc[0]), 310 ) 311 312 rows: list[dict[str, Any]] = [] 313 for test_id, group in df.groupby("TestId", sort=False): 314 if group.empty: 315 rows.append({"TestId": test_id, "LactationMilkYield": np.nan}) 316 continue 317 318 try: 319 cumulative_yield = ISLC_method( 320 df=group, 321 standard_lc=standard_lc_305, 322 correlation_matrix=correlation_matrix, 323 std_per_grid_day=std_per_grid_day, 324 days_in_milk_col="DaysInMilk", 325 milking_yield_col="MilkingYield", 326 ) 327 except ValueError: 328 cumulative_yield = np.nan 329 330 rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield}) 331 332 return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"])) 333 334 335def ISLC_method( 336 df: pd.DataFrame, 337 standard_lc: pd.Series | npt.NDArray[np.float64], 338 correlation_matrix: pd.DataFrame, 339 std_per_grid_day: npt.NDArray[np.float64], 340 days_in_milk_col: str = "DaysInMilk", 341 milking_yield_col: str = "MilkingYield", 342) -> float: 343 """Estimate total lactation yield by predicting missing grid days, then integrating. 344 345 The method fills missing grid-day yields with the ICAR-style relation: 346 347 yp = gp + b* (yi - gi) 348 349 where for each missing day p, the nearest measured day i is used and 350 b* = corr(i,p) * std(p) / std(i). The resulting complete grid profile is 351 integrated with the test-interval style weighting used elsewhere in this 352 package. 353 354 Notes: 355 The integration is performed over the days present after merging 356 measured and predicted points. If the input contains DIM beyond 305, 357 total yield can reflect that extended horizon. 358 """ 359 # add zero and 305 to the grid 360 grid = [0] + GRID_DAYS + [305] 361 dim_col = days_in_milk_col 362 yield_col = milking_yield_col 363 364 standard_lc_series = _curve_to_series(standard_lc) 365 366 # Extract the measured days 367 measured_days = df[dim_col].to_numpy() 368 if measured_days.size == 0: 369 raise ValueError("Input df must contain at least one measured day.") 370 371 # interpolate the measured days to the grid using the standard lactation curve 372 interpolated_df = interpolation_standard_lc( 373 group=df, 374 days_in_milk_col=dim_col, 375 milking_yield_col=yield_col, 376 standard_lc=standard_lc_series, 377 ) 378 379 # drop unnecessary columns 380 interpolated_df = interpolated_df.drop(columns=[dim_col, yield_col]) 381 382 # rename grid colum 383 interpolated_df = interpolated_df.rename( 384 columns={"GridDay": dim_col, "MilkYieldInterp": yield_col} 385 )[[dim_col, yield_col]] 386 387 measured_subset = cast(pd.DataFrame, df.loc[:, [dim_col, yield_col]]) 388 interpolated_subset = cast(pd.DataFrame, interpolated_df.loc[:, [dim_col, yield_col]]) 389 390 # join the interpolated values with the original measurements, keeping all measurements 391 merged_df = cast( 392 pd.DataFrame, 393 pd.merge( 394 measured_subset, 395 interpolated_subset, 396 on=dim_col, 397 how="outer", 398 suffixes=("_meas", "_interp"), 399 ), 400 ) 401 meas_series = cast(pd.Series, merged_df[f"{yield_col}_meas"]) 402 interp_series = cast(pd.Series, merged_df[f"{yield_col}_interp"]) 403 merged_df[yield_col] = cast(pd.Series, meas_series.combine_first(interp_series)) 404 df = cast( 405 pd.DataFrame, 406 merged_df.loc[:, [dim_col, yield_col]].sort_values(by=dim_col), 407 ) 408 409 # days missing from measurement → need prediction 410 days_to_predict = [day for day in grid if day not in df[dim_col].values] 411 412 predicted_rows: list[dict[str, float | int]] = [] 413 414 for day in days_to_predict: 415 # identify closest measured day 416 dim_series = cast(pd.Series, interpolated_df[dim_col]) 417 grid_vals = np.asarray(dim_series.to_numpy(dtype=float), dtype=float) 418 closest_idx = int(np.argmin(np.abs(grid_vals - float(day)))) 419 closest_day = int(dim_series.iloc[closest_idx]) 420 421 # extract mean milk yield from the standard lactation curve for the day to predict 422 expected_yield_grid_day = _curve_value(standard_lc_series, day) 423 expected_yield_measured_day = _curve_value(standard_lc_series, closest_day) 424 425 # calculate difference between measured and expected yield at closest measured day 426 diff = ( 427 interpolated_df.loc[interpolated_df[dim_col] == closest_day, yield_col].iloc[0] 428 - expected_yield_measured_day 429 ) 430 431 # calculate the correlation-based correction factor b* 432 # b* = corr(i,p) * std(p) / std(i) 433 pi = list(grid).index(closest_day) 434 qi = list(grid).index(day) 435 bpj = correlation_matrix.iloc[pi, qi] 436 std_q = std_per_grid_day[qi] 437 std_p = std_per_grid_day[pi] 438 b_star = 0.0 if std_p == 0 else float(bpj * std_q / std_p) 439 440 # predict the yield for the missing day 441 predicted_yield = expected_yield_grid_day + b_star * diff 442 predicted_rows.append({dim_col: int(day), yield_col: float(predicted_yield)}) 443 444 if predicted_rows: 445 df = pd.concat([df, pd.DataFrame(predicted_rows)], ignore_index=True) 446 447 # apply the test interval method to calculate total milk yield over 305 days 448 # sort the dataframe by days in milk and keep one value per day 449 df = df.drop_duplicates(subset=[dim_col], keep="last") 450 df = df.sort_values(by=dim_col) 451 # Trapezoidal contributions 452 df["width"] = df[dim_col].diff().shift(-1) 453 df["avg_yield"] = (df[yield_col] + df[yield_col].shift(-1)) / 2 454 df["trapezoid_area"] = df["width"] * df["avg_yield"] 455 456 trapezoid_values = np.asarray(pd.to_numeric(df["trapezoid_area"], errors="coerce"), dtype=float) 457 total_yield = float(np.nansum(trapezoid_values)) 458 459 return total_yield 460 461 462def ISLC_original_method( 463 df: pd.DataFrame, 464 standard_lc: pd.Series | npt.NDArray[np.float64], 465 correlation_matrix: pd.DataFrame, 466 std_per_grid_day: npt.NDArray[np.float64], 467) -> float: 468 """Estimate 305-day milk yield for a single interpolated lactation 469 from the original paper by Dr. Wilmink in 1987. 470 471 This function computes an estimated 305-day yield by combining three 472 components: known production from interpolated measurements, the 473 population mean for missing grid days (from ``standard_lc``), and a 474 correlation-based deviation correction derived from ``correlation_matrix``. 475 476 Args: 477 df: DataFrame containing at least the columns ``GridDay`` and 478 ``MilkYieldInterp`` (values already interpolated onto the grid). 479 standard_lc: Standard lactation curve values indexed by grid day. 480 correlation_matrix: Correlation coefficients between grid-day yields 481 (matrix-like, rows/columns ordered as the grid). 482 std_per_grid_day: 1-D array of standard deviations for each grid day. 483 484 Returns: 485 A float with the estimated 305-day milk yield. 486 487 Raises: 488 ValueError: if any measured day in ``df`` is not present in the 489 expected global ``grid``. 490 491 Notes: 492 - The routine assumes fixed recording intervals of 20 days, except 493 that the final interval (if day 290 is present) is 25 days. 494 - The implementation expects a global ``grid`` variable to be 495 defined and aligned with ``standard_lc`` and 496 ``std_per_grid_day``. 497 - The amount of estimated milk yields depends on the 498 availability of measured milk yields. 499 - Lactation yield is: 500 sum(known measurements (p) + estimated measurements (q)). 501 - In the implementation according to the CRV E2 document, 502 there should also be a correction factor for the difference between 503 the expected and actual complete lactation yield of the previous lactation. 504 However, this correction factor is not included in the current implementation 505 as it requires additional data (previous lactation yield) that is not always available. 506 """ 507 grid = GRID_DAYS 508 standard_lc_series = _curve_to_series(standard_lc) 509 510 # --- Validate that all measurements lie on the expected grid ------- 511 measured_days = df["GridDay"].values 512 if any(day not in grid for day in measured_days): 513 raise ValueError("At least one DIM in the input df is not part of the grid.") 514 515 # --- Prepare basic components ------------------------------------- 516 df = df.sort_values("GridDay") 517 518 # days missing from measurement → need prediction 519 days_to_predict = [day for day in grid if day not in measured_days] 520 pred_idx = [list(grid).index(d) for d in days_to_predict] 521 522 # mean expected milk yield from standard curve 523 mean_lc = pd.Series([_curve_value(standard_lc_series, g) for g in grid], index=grid) 524 525 # --- Compute known production (sum of actual measurements) --------- 526 if 290 in measured_days: 527 # special last interval: 25 days 528 known_prod = df["MilkYieldInterp"][:-1].sum() * 20 + df["MilkYieldInterp"].iloc[-1] * 25 529 else: 530 known_prod = df["MilkYieldInterp"].sum() * 20 531 532 # --- Compute population mean for missing days ---------------------- 533 pop_mean_missing = mean_lc.loc[days_to_predict] 534 535 if 290 in pop_mean_missing.index: 536 population_mean = pop_mean_missing.iloc[:-1].sum() * 20 + pop_mean_missing.iloc[-1] * 25 537 else: 538 population_mean = pop_mean_missing.sum() * 20 539 540 # --- Compute correlation-based correction -------------------------- 541 days_to_predict_np = np.array(days_to_predict) 542 measured_days_np = np.array(measured_days) 543 544 # for each missing day: find closest measured day 545 closest_measured = [ 546 measured_days_np[np.argmin(np.abs(measured_days_np - g))] for g in days_to_predict_np 547 ] 548 closest_idx = [list(grid).index(d) for d in closest_measured] 549 550 b_star_list = [] 551 for pi, qi in zip(closest_idx, pred_idx): 552 bpj = correlation_matrix.iloc[pi, qi] 553 stdj = std_per_grid_day[qi] 554 stdp = std_per_grid_day[pi] 555 b_star_list.append(bpj * stdj / stdp) 556 557 # scale correction by interval lengths 558 if 290 in measured_days: 559 # last interval belongs to prediction group 560 correction_weighted = sum(b_star_list[:-1]) * 20 561 else: 562 correction_weighted = sum(b_star_list[:-1]) * 20 + b_star_list[-1] * 25 563 564 last_day = df["GridDay"].iloc[-1] 565 last_yield = df["MilkYieldInterp"].iloc[-1] 566 mean_last = mean_lc.loc[last_day] 567 568 correction = correction_weighted * (last_yield - mean_last) 569 570 # --- Final sum ----------------------------------------------------- 571 final_milk_yield = known_prod + population_mean + correction 572 return float(final_milk_yield) 573 574 575def ISLC_original( 576 df: pd.DataFrame, 577 days_in_milk_col: str | None = None, 578 milking_yield_col: str | None = None, 579 test_id_col: str | None = None, 580 default_test_id: int = 0, 581 standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE, 582 correlation_matrix: pd.DataFrame = CORR_MATRIX, 583 std_per_grid_day: npt.NDArray[np.float64] = STDs, 584 fit_standard_lc_from_data: bool = False, 585 reference_df: pd.DataFrame | None = None, 586) -> pd.DataFrame: 587 """Compute estimated 305-day yields using the original grid-based ISLC variant 588 from the original paper by Dr. Wilmink in 1987. 589 590 The function groups standardized input by ``TestId``, interpolates to the 591 fixed GRID_DAYS grid, and applies :func:`ISLC_original_method` per lactation. 592 593 Args: 594 df: Input records with DIM and milk yield columns. 595 days_in_milk_col: Name of DIM column in ``df``. 596 milking_yield_col: Name of milk-yield column in ``df``. 597 test_id_col: Name of lactation identifier column in ``df``. 598 default_test_id: Fallback TestId when ``test_id_col`` is absent. 599 standard_lc_305: Standard lactation curve used for interpolation. 600 correlation_matrix: Correlation matrix over GRID_DAYS. 601 std_per_grid_day: Standard deviations per GRID_DAYS element. 602 fit_standard_lc_from_data: If True, refit standard curve and covariance 603 representation from ``reference_df`` before prediction. 604 reference_df: Reference population used when 605 ``fit_standard_lc_from_data=True``. The reference data is 606 standardized to ``DaysInMilk``/``MilkingYield`` and fit using 607 ``small_grid=True`` so correlation and standard-deviation assets are 608 aligned to ``GRID_DAYS``. 609 610 Returns: 611 DataFrame with columns ``LactationMilkYield`` and ``TestId``. 612 613 Raises: 614 ValueError: If required columns (DaysInMilk or MilkingYield) cannot be 615 found, or if ``fit_standard_lc_from_data=True`` and 616 ``reference_df`` is not provided. 617 618 Notes: 619 Records with DIM > 305 are dropped before interpolation and scoring. 620 """ 621 # Standardize columns and filter DIM <= 305 622 df = standardize_lactation_columns( 623 df, 624 days_in_milk_col=days_in_milk_col, 625 milking_yield_col=milking_yield_col, 626 test_id_col=test_id_col, 627 default_test_id=default_test_id, 628 max_dim=305, 629 ) 630 631 # Fit covariance/standard_lc if requested 632 if fit_standard_lc_from_data is True: 633 if reference_df is None: 634 raise ValueError("Provide reference_df to fit your own standard lactation curve.") 635 reference_df = standardize_lactation_columns( 636 reference_df, 637 days_in_milk_col=days_in_milk_col, 638 milking_yield_col=milking_yield_col, 639 test_id_col=test_id_col, 640 default_test_id=default_test_id, 641 max_dim=305, 642 ) 643 dim = reference_df["DaysInMilk"] 644 milk_yield = reference_df["MilkingYield"] 645 standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink") 646 standard_full_lc_305 = pd.Series(standard_full_lc_305) 647 648 # fit matrix 649 ( 650 correlation_matrix, 651 std_per_grid_day, 652 standard_curve_fitted, 653 ) = create_standard_lc_representation( 654 df=reference_df, 655 standard_lactation_curve=pd.Series(standard_full_lc_305), 656 # reference_df is standardized above, so fixed names are required here 657 days_in_milk_col="DaysInMilk", 658 milking_yield_col="MilkingYield", 659 interpolation_method=interpolation_standard_lc, 660 small_grid=True, 661 ) 662 663 # Align fitted curve with default internal convention (day 0 equals day 1). 664 standard_lc_305 = np.insert( 665 np.asarray(standard_curve_fitted, dtype=float), 666 0, 667 float(standard_curve_fitted.iloc[0]), 668 ) 669 rows: list[dict[str, Any]] = [] 670 for test_id, group in df.groupby("TestId", sort=False): 671 lactation = interpolation_standard_lc( 672 group, 673 days_in_milk_col="DaysInMilk", 674 milking_yield_col="MilkingYield", 675 standard_lc=standard_lc_305, 676 ) 677 if lactation.empty: 678 rows.append({"TestId": test_id, "305_milk_yield": np.nan}) 679 continue 680 681 cumulative_yield = ISLC_original_method( 682 df=lactation, 683 standard_lc=standard_lc_305, 684 correlation_matrix=correlation_matrix, 685 std_per_grid_day=std_per_grid_day, 686 ) 687 rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield}) 688 689 return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"])) 690 691 692""" 693Create your own standard lactation curve, correlation matrix, 694and standard deviations per grid day. 695 696step 1 interpolate to the days on the grid 697Based on the CRV E2 documents, interpolation is not linear and instead 698depends on the standard lactation curve. 699""" 700 701 702def interpolation_standard_lc( 703 group: pd.DataFrame, 704 days_in_milk_col: str, 705 milking_yield_col: str, 706 standard_lc: pd.Series | npt.NDArray[np.float64] | None = None, 707 small_grid: bool = False, 708) -> pd.DataFrame: 709 """Interpolate a single lactation onto the DIM grid guided by a standard curve. 710 711 The function interpolates measured yields for one animal onto the 712 predefined grid of DIM values. When a grid day coincides with a measured 713 DIM the observed yield is used; otherwise the interpolation adjusts the 714 linear slope between neighboring measurements by the difference in the 715 standard curve, inspired by the CRV E2 approach. 716 717 The interpolation formula applied for a grid day ``gday`` between two 718 measured points (x1,y1) and (x2,y2) is:: 719 720 slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1) 721 yi = gi + (slope * (gday - x1) + (y1 - g1)) 722 723 Args: 724 group: DataFrame with test-day records for a single animal. Must 725 contain columns named by ``days_in_milk_col`` and 726 ``milking_yield_col``. 727 days_in_milk_col: Name of the DIM column in ``group``. 728 milking_yield_col: Name of the milk-yield column in ``group``. 729 standard_lc: A Series indexed by DIM providing the expected yield on 730 each grid day; used to guide the interpolation shape. 731 small_grid: If True, interpolate on GRID_DAYS only; otherwise include 732 day 0 and day 305 in the interpolation target grid. 733 734 Returns: 735 A DataFrame with one row per interpolated grid day containing the 736 original identifying columns (from the first row of ``group``), plus 737 the columns ``GridDay`` and ``MilkYieldInterp``. 738 739 Notes: 740 - No interpolation is performed outside the range bounded by the 741 first and last measured DIM for the lactation (those grid days are 742 skipped). 743 - No interpolation is performed for grid days where the right 744 neighboring measured DIM is >= 305, because this lies outside the 745 intended standard-curve support. 746 - The implementation defines a local default ``grid`` of 747 [10, 30, ..., 290]. 748 749 750 """ 751 if small_grid: 752 grid = GRID_DAYS 753 else: 754 # add zero and 305 to the grid 755 grid = [0] + GRID_DAYS + [305] 756 757 if standard_lc is None: 758 raise ValueError("A standard lactation curve is required for interpolation_standard_lc.") 759 760 standard_lc_series = _curve_to_series(standard_lc) 761 762 # Sort and ensure unique DIMs 763 group = group.sort_values(days_in_milk_col).drop_duplicates(days_in_milk_col) 764 dims = group[days_in_milk_col].tolist() 765 766 rows = [] 767 768 # loop over the grid days 769 for gday in grid: 770 # --- CASE 1: Exact measured day --------------------------- 771 if gday in dims: 772 y_val = group.loc[group[days_in_milk_col] == gday, milking_yield_col].iloc[0] 773 774 row = group.iloc[0].to_dict() 775 row["GridDay"] = gday 776 row["MilkYieldInterp"] = float(y_val) 777 rows.append(row) 778 continue 779 780 # --- CASE 2: interpolate between measured days ------------- 781 before_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] < gday]) 782 after_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] > gday]) 783 before = before_df.tail(1) 784 after = after_df.head(1) 785 786 # CASE 3 No interpolation if grid day is outside the range of measured DIM 787 if before.empty or after.empty: 788 continue 789 790 # CASE 4: No interpolation if the day after is day 305 or higher, 791 # as this is outside the range of the standard curve 792 if float(after.iloc[0][days_in_milk_col]) >= 305: 793 continue 794 795 x1 = float(before.iloc[0][days_in_milk_col]) 796 y1 = float(before.iloc[0][milking_yield_col]) 797 x2 = float(after.iloc[0][days_in_milk_col]) 798 y2 = float(after.iloc[0][milking_yield_col]) 799 800 # expected yields from standard lactation curve 801 g1 = _curve_value(standard_lc_series, int(x1)) 802 g2 = _curve_value(standard_lc_series, int(x2)) 803 gi = _curve_value(standard_lc_series, int(gday)) 804 805 # Wilmink-based interpolation formula 806 slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1) 807 yi = gi + (slope * (gday - x1) + (y1 - g1)) 808 809 row = group.iloc[0].to_dict() 810 row["GridDay"] = gday 811 row["MilkYieldInterp"] = float(yi) 812 rows.append(row) 813 814 if not rows: 815 return pd.DataFrame( 816 columns=pd.Index([*group.columns.to_list(), "GridDay", "MilkYieldInterp"]) 817 ) 818 819 return pd.DataFrame(rows) 820 821 822def linear_interpd_all_to_grid( 823 group: pd.DataFrame, 824 days_in_milk_col: str, 825 milking_yield_col: str, 826 standard_lc: pd.Series | None = None, 827 small_grid: bool = False, 828) -> pd.DataFrame: 829 """Linearly interpolate all grid days for a lactation. 830 831 This helper uses linear interpolation (with extrapolation) to produce 832 milk-yield values for every grid day regardless of whether the grid day 833 lies between measured observations. 834 835 Args: 836 group: DataFrame containing measured DIM and yield values for one 837 lactation. 838 days_in_milk_col: Name of the DIM column. 839 milking_yield_col: Name of the milk-yield column. 840 small_grid: If True, interpolate on GRID_DAYS only; otherwise include 841 day 0 and day 305. 842 843 Returns: 844 A DataFrame with identifying columns copied from ``group``'s first 845 row and columns ``GridDay`` and ``MilkYieldInterp`` containing the 846 interpolated (or extrapolated) yields for every grid day. 847 """ 848 if small_grid: 849 grid = GRID_DAYS 850 else: 851 # add zero and 305 to the grid 852 grid = [0] + GRID_DAYS + [305] 853 854 group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col) 855 f = interp1d( 856 group[days_in_milk_col], 857 group[milking_yield_col], 858 kind="linear", 859 fill_value=cast(Any, "extrapolate"), 860 ) 861 862 base = { 863 col: group[col].iloc[0] 864 for col in group.columns 865 if col not in [days_in_milk_col, milking_yield_col] 866 } 867 base["GridDay"] = grid 868 base["MilkYieldInterp"] = f(grid) 869 870 return pd.DataFrame(base) 871 872 873# Create an interpolation function that only outputs grid days between two 874# milk measurements using linear interpolation. 875def linear_interpd_closest_to_grid( 876 group: pd.DataFrame, 877 days_in_milk_col: str, 878 milking_yield_col: str, 879 standard_lc: pd.Series | None = None, 880 small_grid: bool = False, 881) -> pd.DataFrame | None: 882 """Linearly interpolate grid days between measured observations. 883 884 This helper returns interpolated yields only for grid days that lie 885 between the first and last measured DIM for the lactation. If no grid 886 days fall within the measured range the function returns ``None``. 887 888 Args: 889 group: DataFrame for a single lactation. 890 days_in_milk_col: Name of the DIM column. 891 milking_yield_col: Name of the milk-yield column. 892 small_grid: If True, use GRID_DAYS only; otherwise include 893 day 0 and day 305. 894 895 Returns: 896 A DataFrame with identifying columns plus ``GridDay`` and 897 ``MilkYieldInterp``, or ``None`` if there are no grid days within 898 the measured range. 899 """ 900 if small_grid: 901 grid = GRID_DAYS 902 else: 903 # add zero and 305 to the grid 904 grid = [0] + GRID_DAYS + [305] 905 906 group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col) 907 908 # Find the range of interpolatable DIM 909 min_dim = group[days_in_milk_col].min() 910 max_dim = group[days_in_milk_col].max() 911 grid_in_range = [g for g in grid if min_dim <= g <= max_dim] 912 913 # apply linear interpolation 914 if not grid_in_range: 915 return None 916 f = interp1d( 917 group[days_in_milk_col], 918 group[milking_yield_col], 919 kind="linear", 920 fill_value=cast(Any, np.nan), 921 ) 922 923 # create a new dataframe with the newly created columns 924 base = { 925 col: group[col].iloc[0] 926 for col in group.columns 927 if col not in [days_in_milk_col, milking_yield_col] 928 } 929 base["GridDay"] = grid_in_range 930 base["MilkYieldInterp"] = f(grid_in_range) 931 932 return pd.DataFrame(base) 933 934 935def create_standard_lc_representation( 936 df: pd.DataFrame, 937 standard_lactation_curve: pd.Series, 938 days_in_milk_col: str, 939 milking_yield_col: str, 940 interpolation_method: InterpolationMethod = interpolation_standard_lc, 941 lc_model: str = "Wilmink", 942 small_grid: bool = False, 943) -> tuple[pd.DataFrame, npt.NDArray[np.float64], pd.Series]: 944 """Create a standard lactation curve with correlation matrix from data. 945 946 This function estimates the correlation structure and standard deviations 947 of milk yields across grid days, and refits a Wilmink lactation curve model to 948 the interpolated data. The Wilmink model is hereby the default, 949 but any model from the LC package can be used. 950 The outputs can be used as inputs to the 951 ``ISLC_method`` or ``ISLC_original`` functions for predicting 305-d yields. 952 953 The process standardizes yields per animal and computes row-wise 954 (per-animal) statistics, then derives relationships across grid days. 955 956 Args: 957 df: Input DataFrame containing test-day records. Must include a 958 ``TestId`` column to identify unique lactations. Also requires 959 columns specified by ``days_in_milk_col`` and ``milking_yield_col``. 960 standard_lactation_curve: A Series providing a reference lactation 961 curve (e.g., the fitted Wilmink curve on which to base 962 interpolation). 963 days_in_milk_col: Name of the column in ``df`` containing days in milk (DIM). 964 milking_yield_col: Name of the column in ``df`` containing measured 965 milk yields. 966 interpolation_method: An interpolation method conforming to 967 ``InterpolationMethod`` protocol (default: ``interpolation_standard_lc``). 968 Must be callable with signature 969 (group, days_in_milk_col, milking_yield_col, standard_lc). 970 lc_model: The lactation curve model to fit to the interpolated data. Default is "Wilmink". 971 small_grid: If True, build representation on GRID_DAYS only. 972 973 Returns: 974 A tuple of three elements: 975 - corr (pd.DataFrame): Correlation matrix between grid-day yields. 976 - std_per_grid_day (np.ndarray): Standard deviations of standardized 977 yields per grid day. 978 - standard_lactation_curve_grid (pd.Series): Refitted Wilmink model 979 indexed by DIM (1–305). 980 981 Notes: 982 - Expects ``df`` to have a ``TestId`` column to group lactations. 983 - Performs standardization per animal (row-wise), so output 984 correlations and standard deviations reflect between-day variation 985 within standardized animal profiles. 986 - Uses ``fit_lactation_curve`` from the ``lactationcurve`` package 987 with model='Wilmink' and fitting='frequentist'. 988 However other models can be used by specifying the ``lc_model`` argument. 989 """ 990 991 if "TestId" not in df.columns: 992 raise ValueError("DataFrame must contain a 'TestId' column.") 993 if days_in_milk_col not in df.columns or milking_yield_col not in df.columns: 994 raise ValueError(f"Columns '{days_in_milk_col}' and '{milking_yield_col}' must be in df.") 995 996 interpolated_groups: list[pd.DataFrame] = [] 997 for test_id, group in df.groupby("TestId", sort=False): 998 try: 999 interp_group = interpolation_method( 1000 group, 1001 days_in_milk_col=days_in_milk_col, 1002 milking_yield_col=milking_yield_col, 1003 standard_lc=standard_lactation_curve, 1004 small_grid=small_grid, 1005 ) 1006 except TypeError: 1007 # Backward compatibility for custom interpolation callables 1008 # that do not yet accept small_grid. 1009 interp_group = interpolation_method( 1010 group, 1011 days_in_milk_col=days_in_milk_col, 1012 milking_yield_col=milking_yield_col, 1013 standard_lc=standard_lactation_curve, 1014 ) 1015 if interp_group.empty: 1016 continue 1017 if "TestId" not in interp_group.columns: 1018 interp_group = interp_group.copy() 1019 interp_group["TestId"] = test_id 1020 interpolated_groups.append(interp_group) 1021 1022 if not interpolated_groups: 1023 raise ValueError("No interpolated rows were produced; cannot build representation.") 1024 1025 df_grid = pd.concat(interpolated_groups, ignore_index=True) 1026 1027 # define Znp 1028 # pivot df to make GridDay the columns 1029 # Z should be the unscaled pivot table (TestId × GridDay) 1030 Z = df_grid.pivot(index="TestId", columns="GridDay", values="MilkYieldInterp") 1031 1032 # Standardize records per cow (so each row has mean=0, std=1) 1033 Z_std = (Z.sub(Z.mean(axis=1), axis=0)).div(Z.std(axis=1), axis=0) 1034 1035 # Convert to NumPy array 1036 Znp = Z_std.to_numpy() 1037 1038 # calculate correlation matrix 1039 corr = pd.DataFrame(Znp, columns=Z.columns).corr() 1040 1041 # calculate standard deviations for each grid day from the Znp matrix 1042 std_per_grid_day = np.nanstd(Znp, axis=0) # ignores NaN 1043 1044 # fit Wilmink model to get mean values and std for each cow 1045 standard_lactation_curve_grid = pd.Series( 1046 fit_lactation_curve( 1047 df_grid["GridDay"].values, 1048 df_grid["MilkYieldInterp"].values, 1049 model=lc_model, 1050 fitting="frequentist", 1051 ), 1052 index=range(1, 306), 1053 ) 1054 return corr, std_per_grid_day, standard_lactation_curve_grid 1055 1056 1057def _set_doc_signatures() -> None: 1058 """Override displayed defaults in docs without changing runtime behavior.""" 1059 doc_defaults = { 1060 "standard_lc_305": _DOC_STANDARD_CURVE, 1061 "correlation_matrix": _DOC_CORR_MATRIX, 1062 "std_per_grid_day": _DOC_STDS, 1063 } 1064 1065 for func in (ISLC, ISLC_original): 1066 signature = inspect.signature(func) 1067 params = [ 1068 param.replace(default=doc_defaults[param.name]) if param.name in doc_defaults else param 1069 for param in signature.parameters.values() 1070 ] 1071 func.__signature__ = signature.replace(parameters=params) 1072 1073 1074_set_doc_signatures()
187class InterpolationMethod(Protocol): 188 """Protocol for milk-yield interpolation methods. 189 190 Defines the signature expected by functions that interpolate test-day 191 yields onto a predefined DIM grid. 192 """ 193 194 def __call__( 195 self, 196 group: pd.DataFrame, 197 days_in_milk_col: str, 198 milking_yield_col: str, 199 standard_lc: pd.Series | None = None, 200 small_grid: bool = False, 201 ) -> pd.DataFrame: 202 """Interpolate yields for a single lactation onto the DIM grid. 203 204 Args: 205 group: Test-day records for a single animal. 206 days_in_milk_col: Column name containing days in milk (DIM). 207 milking_yield_col: Column name containing milk yields. 208 standard_lc: Optional standard lactation curve to guide 209 interpolation (method-dependent). 210 small_grid: If True, interpolate only on GRID_DAYS; 211 otherwise use [0] + GRID_DAYS + [305]. 212 213 Returns: 214 A DataFrame with interpolated yields, typically including 215 columns ``GridDay`` and ``MilkYieldInterp``. 216 """ 217 ...
Protocol for milk-yield interpolation methods.
Defines the signature expected by functions that interpolate test-day yields onto a predefined DIM grid.
1771def _no_init_or_replace_init(self, *args, **kwargs): 1772 cls = type(self) 1773 1774 if cls._is_protocol: 1775 raise TypeError('Protocols cannot be instantiated') 1776 1777 # Already using a custom `__init__`. No need to calculate correct 1778 # `__init__` to call. This can lead to RecursionError. See bpo-45121. 1779 if cls.__init__ is not _no_init_or_replace_init: 1780 return 1781 1782 # Initially, `__init__` of a protocol subclass is set to `_no_init_or_replace_init`. 1783 # The first instantiation of the subclass will call `_no_init_or_replace_init` which 1784 # searches for a proper new `__init__` in the MRO. The new `__init__` 1785 # replaces the subclass' old `__init__` (ie `_no_init_or_replace_init`). Subsequent 1786 # instantiation of the protocol subclass will thus use the new 1787 # `__init__` and no longer call `_no_init_or_replace_init`. 1788 for base in cls.__mro__: 1789 init = base.__dict__.get('__init__', _no_init_or_replace_init) 1790 if init is not _no_init_or_replace_init: 1791 cls.__init__ = init 1792 break 1793 else: 1794 # should not happen 1795 cls.__init__ = object.__init__ 1796 1797 cls.__init__(self, *args, **kwargs)
220def ISLC( 221 df: pd.DataFrame, 222 days_in_milk_col: str | None = None, 223 milking_yield_col: str | None = None, 224 test_id_col: str | None = None, 225 default_test_id: int = 0, 226 standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE, 227 correlation_matrix: pd.DataFrame = CORR_MATRIX, 228 std_per_grid_day: npt.NDArray[np.float64] = STDs, 229 max_dim: int | str = 305, 230 fit_standard_lc_from_data: bool = False, 231 reference_df: pd.DataFrame | None = None, 232) -> pd.DataFrame: 233 """Estimate cumulative lactation yield per TestId using ISLC. 234 235 This variant applies :func:`ISLC_method` to standardized lactation records. 236 By default it uses the package-provided standard curve, correlation matrix, 237 and per-grid day standard deviations. Optionally, these assets can be refit 238 from a reference dataset by providing a pandas dataframe at 'reference_df =' 239 when ``fit_standard_lc_from_data`` is True. 240 Alternative for customization is to set standard_lc_305, 241 correlation_matrix, and std_per_grid_day directly. 242 243 Args: 244 df: Input test-day records. 245 days_in_milk_col: Name of DIM column in ``df``. 246 milking_yield_col: Name of milk-yield column in ``df``. 247 test_id_col: Name of lactation identifier column in ``df``. 248 default_test_id: Fallback TestId value when ``test_id_col`` is absent. 249 standard_lc_305: Standard lactation curve values used by ISLC. 250 correlation_matrix: Correlation matrix aligned with ISLC grid. 251 std_per_grid_day: Standard deviations aligned with ISLC grid. 252 max_dim: Maximum DIM to include 253 (or ``"max"`` so highest test day DIM is considered the final lactation day). 254 fit_standard_lc_from_data: If True, fit representation from ``reference_df``. 255 reference_df: Reference records used when fitting curve and covariance. 256 257 Returns: 258 DataFrame with columns ``TestId`` and ``LactationMilkYield``. 259 260 Raises: 261 ValueError: If required columns are missing or fit mode has no reference data. 262 """ 263 # Standardize columns and filter DIM <= 305 264 df = standardize_lactation_columns( 265 df, 266 days_in_milk_col=days_in_milk_col, 267 milking_yield_col=milking_yield_col, 268 test_id_col=test_id_col, 269 default_test_id=default_test_id, 270 max_dim=max_dim, 271 ) 272 273 # Fit covariance/standard_lc if requested 274 if fit_standard_lc_from_data is True: 275 if reference_df is None: 276 raise ValueError("Provide reference_df to fit your own standard lactation curve.") 277 reference_df = standardize_lactation_columns( 278 reference_df, 279 days_in_milk_col=days_in_milk_col, 280 milking_yield_col=milking_yield_col, 281 test_id_col=test_id_col, 282 default_test_id=default_test_id, 283 max_dim=305, 284 ) 285 dim = reference_df["DaysInMilk"] 286 milk_yield = reference_df["MilkingYield"] 287 standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink") 288 # convert to pandas series 289 standard_full_lc_305 = pd.Series(standard_full_lc_305) 290 291 # fit matrix 292 ( 293 correlation_matrix, 294 std_per_grid_day, 295 standard_curve_fitted, 296 ) = create_standard_lc_representation( 297 df=reference_df, 298 standard_lactation_curve=pd.Series(standard_full_lc_305), 299 # reference_df is standardized above, so fixed names are required here 300 days_in_milk_col="DaysInMilk", 301 milking_yield_col="MilkingYield", 302 interpolation_method=interpolation_standard_lc, 303 small_grid=False, 304 ) 305 306 # Align fitted curve with default internal convention (day 0 equals day 1). 307 standard_lc_305 = np.insert( 308 np.asarray(standard_curve_fitted, dtype=float), 309 0, 310 float(standard_curve_fitted.iloc[0]), 311 ) 312 313 rows: list[dict[str, Any]] = [] 314 for test_id, group in df.groupby("TestId", sort=False): 315 if group.empty: 316 rows.append({"TestId": test_id, "LactationMilkYield": np.nan}) 317 continue 318 319 try: 320 cumulative_yield = ISLC_method( 321 df=group, 322 standard_lc=standard_lc_305, 323 correlation_matrix=correlation_matrix, 324 std_per_grid_day=std_per_grid_day, 325 days_in_milk_col="DaysInMilk", 326 milking_yield_col="MilkingYield", 327 ) 328 except ValueError: 329 cumulative_yield = np.nan 330 331 rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield}) 332 333 return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"]))
Estimate cumulative lactation yield per TestId using ISLC.
This variant applies ISLC_method() to standardized lactation records.
By default it uses the package-provided standard curve, correlation matrix,
and per-grid day standard deviations. Optionally, these assets can be refit
from a reference dataset by providing a pandas dataframe at 'reference_df ='
when fit_standard_lc_from_data is True.
Alternative for customization is to set standard_lc_305,
correlation_matrix, and std_per_grid_day directly.
Arguments:
- df: Input test-day records.
- days_in_milk_col: Name of DIM column in
df. - milking_yield_col: Name of milk-yield column in
df. - test_id_col: Name of lactation identifier column in
df. - default_test_id: Fallback TestId value when
test_id_colis absent. - standard_lc_305: Standard lactation curve values used by ISLC.
- correlation_matrix: Correlation matrix aligned with ISLC grid.
- std_per_grid_day: Standard deviations aligned with ISLC grid.
- max_dim: Maximum DIM to include
(or
"max"so highest test day DIM is considered the final lactation day). - fit_standard_lc_from_data: If True, fit representation from
reference_df. - reference_df: Reference records used when fitting curve and covariance.
Returns:
DataFrame with columns
TestIdandLactationMilkYield.
Raises:
- ValueError: If required columns are missing or fit mode has no reference data.
336def ISLC_method( 337 df: pd.DataFrame, 338 standard_lc: pd.Series | npt.NDArray[np.float64], 339 correlation_matrix: pd.DataFrame, 340 std_per_grid_day: npt.NDArray[np.float64], 341 days_in_milk_col: str = "DaysInMilk", 342 milking_yield_col: str = "MilkingYield", 343) -> float: 344 """Estimate total lactation yield by predicting missing grid days, then integrating. 345 346 The method fills missing grid-day yields with the ICAR-style relation: 347 348 yp = gp + b* (yi - gi) 349 350 where for each missing day p, the nearest measured day i is used and 351 b* = corr(i,p) * std(p) / std(i). The resulting complete grid profile is 352 integrated with the test-interval style weighting used elsewhere in this 353 package. 354 355 Notes: 356 The integration is performed over the days present after merging 357 measured and predicted points. If the input contains DIM beyond 305, 358 total yield can reflect that extended horizon. 359 """ 360 # add zero and 305 to the grid 361 grid = [0] + GRID_DAYS + [305] 362 dim_col = days_in_milk_col 363 yield_col = milking_yield_col 364 365 standard_lc_series = _curve_to_series(standard_lc) 366 367 # Extract the measured days 368 measured_days = df[dim_col].to_numpy() 369 if measured_days.size == 0: 370 raise ValueError("Input df must contain at least one measured day.") 371 372 # interpolate the measured days to the grid using the standard lactation curve 373 interpolated_df = interpolation_standard_lc( 374 group=df, 375 days_in_milk_col=dim_col, 376 milking_yield_col=yield_col, 377 standard_lc=standard_lc_series, 378 ) 379 380 # drop unnecessary columns 381 interpolated_df = interpolated_df.drop(columns=[dim_col, yield_col]) 382 383 # rename grid colum 384 interpolated_df = interpolated_df.rename( 385 columns={"GridDay": dim_col, "MilkYieldInterp": yield_col} 386 )[[dim_col, yield_col]] 387 388 measured_subset = cast(pd.DataFrame, df.loc[:, [dim_col, yield_col]]) 389 interpolated_subset = cast(pd.DataFrame, interpolated_df.loc[:, [dim_col, yield_col]]) 390 391 # join the interpolated values with the original measurements, keeping all measurements 392 merged_df = cast( 393 pd.DataFrame, 394 pd.merge( 395 measured_subset, 396 interpolated_subset, 397 on=dim_col, 398 how="outer", 399 suffixes=("_meas", "_interp"), 400 ), 401 ) 402 meas_series = cast(pd.Series, merged_df[f"{yield_col}_meas"]) 403 interp_series = cast(pd.Series, merged_df[f"{yield_col}_interp"]) 404 merged_df[yield_col] = cast(pd.Series, meas_series.combine_first(interp_series)) 405 df = cast( 406 pd.DataFrame, 407 merged_df.loc[:, [dim_col, yield_col]].sort_values(by=dim_col), 408 ) 409 410 # days missing from measurement → need prediction 411 days_to_predict = [day for day in grid if day not in df[dim_col].values] 412 413 predicted_rows: list[dict[str, float | int]] = [] 414 415 for day in days_to_predict: 416 # identify closest measured day 417 dim_series = cast(pd.Series, interpolated_df[dim_col]) 418 grid_vals = np.asarray(dim_series.to_numpy(dtype=float), dtype=float) 419 closest_idx = int(np.argmin(np.abs(grid_vals - float(day)))) 420 closest_day = int(dim_series.iloc[closest_idx]) 421 422 # extract mean milk yield from the standard lactation curve for the day to predict 423 expected_yield_grid_day = _curve_value(standard_lc_series, day) 424 expected_yield_measured_day = _curve_value(standard_lc_series, closest_day) 425 426 # calculate difference between measured and expected yield at closest measured day 427 diff = ( 428 interpolated_df.loc[interpolated_df[dim_col] == closest_day, yield_col].iloc[0] 429 - expected_yield_measured_day 430 ) 431 432 # calculate the correlation-based correction factor b* 433 # b* = corr(i,p) * std(p) / std(i) 434 pi = list(grid).index(closest_day) 435 qi = list(grid).index(day) 436 bpj = correlation_matrix.iloc[pi, qi] 437 std_q = std_per_grid_day[qi] 438 std_p = std_per_grid_day[pi] 439 b_star = 0.0 if std_p == 0 else float(bpj * std_q / std_p) 440 441 # predict the yield for the missing day 442 predicted_yield = expected_yield_grid_day + b_star * diff 443 predicted_rows.append({dim_col: int(day), yield_col: float(predicted_yield)}) 444 445 if predicted_rows: 446 df = pd.concat([df, pd.DataFrame(predicted_rows)], ignore_index=True) 447 448 # apply the test interval method to calculate total milk yield over 305 days 449 # sort the dataframe by days in milk and keep one value per day 450 df = df.drop_duplicates(subset=[dim_col], keep="last") 451 df = df.sort_values(by=dim_col) 452 # Trapezoidal contributions 453 df["width"] = df[dim_col].diff().shift(-1) 454 df["avg_yield"] = (df[yield_col] + df[yield_col].shift(-1)) / 2 455 df["trapezoid_area"] = df["width"] * df["avg_yield"] 456 457 trapezoid_values = np.asarray(pd.to_numeric(df["trapezoid_area"], errors="coerce"), dtype=float) 458 total_yield = float(np.nansum(trapezoid_values)) 459 460 return total_yield
Estimate total lactation yield by predicting missing grid days, then integrating.
The method fills missing grid-day yields with the ICAR-style relation:
yp = gp + b* (yi - gi)
where for each missing day p, the nearest measured day i is used and
b* = corr(i,p) * std(p) / std(i). The resulting complete grid profile is
integrated with the test-interval style weighting used elsewhere in this
package.
Notes:
The integration is performed over the days present after merging measured and predicted points. If the input contains DIM beyond 305, total yield can reflect that extended horizon.
463def ISLC_original_method( 464 df: pd.DataFrame, 465 standard_lc: pd.Series | npt.NDArray[np.float64], 466 correlation_matrix: pd.DataFrame, 467 std_per_grid_day: npt.NDArray[np.float64], 468) -> float: 469 """Estimate 305-day milk yield for a single interpolated lactation 470 from the original paper by Dr. Wilmink in 1987. 471 472 This function computes an estimated 305-day yield by combining three 473 components: known production from interpolated measurements, the 474 population mean for missing grid days (from ``standard_lc``), and a 475 correlation-based deviation correction derived from ``correlation_matrix``. 476 477 Args: 478 df: DataFrame containing at least the columns ``GridDay`` and 479 ``MilkYieldInterp`` (values already interpolated onto the grid). 480 standard_lc: Standard lactation curve values indexed by grid day. 481 correlation_matrix: Correlation coefficients between grid-day yields 482 (matrix-like, rows/columns ordered as the grid). 483 std_per_grid_day: 1-D array of standard deviations for each grid day. 484 485 Returns: 486 A float with the estimated 305-day milk yield. 487 488 Raises: 489 ValueError: if any measured day in ``df`` is not present in the 490 expected global ``grid``. 491 492 Notes: 493 - The routine assumes fixed recording intervals of 20 days, except 494 that the final interval (if day 290 is present) is 25 days. 495 - The implementation expects a global ``grid`` variable to be 496 defined and aligned with ``standard_lc`` and 497 ``std_per_grid_day``. 498 - The amount of estimated milk yields depends on the 499 availability of measured milk yields. 500 - Lactation yield is: 501 sum(known measurements (p) + estimated measurements (q)). 502 - In the implementation according to the CRV E2 document, 503 there should also be a correction factor for the difference between 504 the expected and actual complete lactation yield of the previous lactation. 505 However, this correction factor is not included in the current implementation 506 as it requires additional data (previous lactation yield) that is not always available. 507 """ 508 grid = GRID_DAYS 509 standard_lc_series = _curve_to_series(standard_lc) 510 511 # --- Validate that all measurements lie on the expected grid ------- 512 measured_days = df["GridDay"].values 513 if any(day not in grid for day in measured_days): 514 raise ValueError("At least one DIM in the input df is not part of the grid.") 515 516 # --- Prepare basic components ------------------------------------- 517 df = df.sort_values("GridDay") 518 519 # days missing from measurement → need prediction 520 days_to_predict = [day for day in grid if day not in measured_days] 521 pred_idx = [list(grid).index(d) for d in days_to_predict] 522 523 # mean expected milk yield from standard curve 524 mean_lc = pd.Series([_curve_value(standard_lc_series, g) for g in grid], index=grid) 525 526 # --- Compute known production (sum of actual measurements) --------- 527 if 290 in measured_days: 528 # special last interval: 25 days 529 known_prod = df["MilkYieldInterp"][:-1].sum() * 20 + df["MilkYieldInterp"].iloc[-1] * 25 530 else: 531 known_prod = df["MilkYieldInterp"].sum() * 20 532 533 # --- Compute population mean for missing days ---------------------- 534 pop_mean_missing = mean_lc.loc[days_to_predict] 535 536 if 290 in pop_mean_missing.index: 537 population_mean = pop_mean_missing.iloc[:-1].sum() * 20 + pop_mean_missing.iloc[-1] * 25 538 else: 539 population_mean = pop_mean_missing.sum() * 20 540 541 # --- Compute correlation-based correction -------------------------- 542 days_to_predict_np = np.array(days_to_predict) 543 measured_days_np = np.array(measured_days) 544 545 # for each missing day: find closest measured day 546 closest_measured = [ 547 measured_days_np[np.argmin(np.abs(measured_days_np - g))] for g in days_to_predict_np 548 ] 549 closest_idx = [list(grid).index(d) for d in closest_measured] 550 551 b_star_list = [] 552 for pi, qi in zip(closest_idx, pred_idx): 553 bpj = correlation_matrix.iloc[pi, qi] 554 stdj = std_per_grid_day[qi] 555 stdp = std_per_grid_day[pi] 556 b_star_list.append(bpj * stdj / stdp) 557 558 # scale correction by interval lengths 559 if 290 in measured_days: 560 # last interval belongs to prediction group 561 correction_weighted = sum(b_star_list[:-1]) * 20 562 else: 563 correction_weighted = sum(b_star_list[:-1]) * 20 + b_star_list[-1] * 25 564 565 last_day = df["GridDay"].iloc[-1] 566 last_yield = df["MilkYieldInterp"].iloc[-1] 567 mean_last = mean_lc.loc[last_day] 568 569 correction = correction_weighted * (last_yield - mean_last) 570 571 # --- Final sum ----------------------------------------------------- 572 final_milk_yield = known_prod + population_mean + correction 573 return float(final_milk_yield)
Estimate 305-day milk yield for a single interpolated lactation from the original paper by Dr. Wilmink in 1987.
This function computes an estimated 305-day yield by combining three
components: known production from interpolated measurements, the
population mean for missing grid days (from standard_lc), and a
correlation-based deviation correction derived from correlation_matrix.
Arguments:
- df: DataFrame containing at least the columns
GridDayandMilkYieldInterp(values already interpolated onto the grid). - standard_lc: Standard lactation curve values indexed by grid day.
- correlation_matrix: Correlation coefficients between grid-day yields (matrix-like, rows/columns ordered as the grid).
- std_per_grid_day: 1-D array of standard deviations for each grid day.
Returns:
A float with the estimated 305-day milk yield.
Raises:
- ValueError: if any measured day in
dfis not present in the expected globalgrid.
Notes:
- The routine assumes fixed recording intervals of 20 days, except that the final interval (if day 290 is present) is 25 days.
- The implementation expects a global
gridvariable to be defined and aligned withstandard_lcandstd_per_grid_day.- The amount of estimated milk yields depends on the availability of measured milk yields.
- Lactation yield is: sum(known measurements (p) + estimated measurements (q)).
- In the implementation according to the CRV E2 document, there should also be a correction factor for the difference between the expected and actual complete lactation yield of the previous lactation. However, this correction factor is not included in the current implementation as it requires additional data (previous lactation yield) that is not always available.
576def ISLC_original( 577 df: pd.DataFrame, 578 days_in_milk_col: str | None = None, 579 milking_yield_col: str | None = None, 580 test_id_col: str | None = None, 581 default_test_id: int = 0, 582 standard_lc_305: pd.Series | npt.NDArray[np.float64] = STANDARD_CURVE, 583 correlation_matrix: pd.DataFrame = CORR_MATRIX, 584 std_per_grid_day: npt.NDArray[np.float64] = STDs, 585 fit_standard_lc_from_data: bool = False, 586 reference_df: pd.DataFrame | None = None, 587) -> pd.DataFrame: 588 """Compute estimated 305-day yields using the original grid-based ISLC variant 589 from the original paper by Dr. Wilmink in 1987. 590 591 The function groups standardized input by ``TestId``, interpolates to the 592 fixed GRID_DAYS grid, and applies :func:`ISLC_original_method` per lactation. 593 594 Args: 595 df: Input records with DIM and milk yield columns. 596 days_in_milk_col: Name of DIM column in ``df``. 597 milking_yield_col: Name of milk-yield column in ``df``. 598 test_id_col: Name of lactation identifier column in ``df``. 599 default_test_id: Fallback TestId when ``test_id_col`` is absent. 600 standard_lc_305: Standard lactation curve used for interpolation. 601 correlation_matrix: Correlation matrix over GRID_DAYS. 602 std_per_grid_day: Standard deviations per GRID_DAYS element. 603 fit_standard_lc_from_data: If True, refit standard curve and covariance 604 representation from ``reference_df`` before prediction. 605 reference_df: Reference population used when 606 ``fit_standard_lc_from_data=True``. The reference data is 607 standardized to ``DaysInMilk``/``MilkingYield`` and fit using 608 ``small_grid=True`` so correlation and standard-deviation assets are 609 aligned to ``GRID_DAYS``. 610 611 Returns: 612 DataFrame with columns ``LactationMilkYield`` and ``TestId``. 613 614 Raises: 615 ValueError: If required columns (DaysInMilk or MilkingYield) cannot be 616 found, or if ``fit_standard_lc_from_data=True`` and 617 ``reference_df`` is not provided. 618 619 Notes: 620 Records with DIM > 305 are dropped before interpolation and scoring. 621 """ 622 # Standardize columns and filter DIM <= 305 623 df = standardize_lactation_columns( 624 df, 625 days_in_milk_col=days_in_milk_col, 626 milking_yield_col=milking_yield_col, 627 test_id_col=test_id_col, 628 default_test_id=default_test_id, 629 max_dim=305, 630 ) 631 632 # Fit covariance/standard_lc if requested 633 if fit_standard_lc_from_data is True: 634 if reference_df is None: 635 raise ValueError("Provide reference_df to fit your own standard lactation curve.") 636 reference_df = standardize_lactation_columns( 637 reference_df, 638 days_in_milk_col=days_in_milk_col, 639 milking_yield_col=milking_yield_col, 640 test_id_col=test_id_col, 641 default_test_id=default_test_id, 642 max_dim=305, 643 ) 644 dim = reference_df["DaysInMilk"] 645 milk_yield = reference_df["MilkingYield"] 646 standard_full_lc_305 = fit_lactation_curve(dim, milkrecordings=milk_yield, model="wilmink") 647 standard_full_lc_305 = pd.Series(standard_full_lc_305) 648 649 # fit matrix 650 ( 651 correlation_matrix, 652 std_per_grid_day, 653 standard_curve_fitted, 654 ) = create_standard_lc_representation( 655 df=reference_df, 656 standard_lactation_curve=pd.Series(standard_full_lc_305), 657 # reference_df is standardized above, so fixed names are required here 658 days_in_milk_col="DaysInMilk", 659 milking_yield_col="MilkingYield", 660 interpolation_method=interpolation_standard_lc, 661 small_grid=True, 662 ) 663 664 # Align fitted curve with default internal convention (day 0 equals day 1). 665 standard_lc_305 = np.insert( 666 np.asarray(standard_curve_fitted, dtype=float), 667 0, 668 float(standard_curve_fitted.iloc[0]), 669 ) 670 rows: list[dict[str, Any]] = [] 671 for test_id, group in df.groupby("TestId", sort=False): 672 lactation = interpolation_standard_lc( 673 group, 674 days_in_milk_col="DaysInMilk", 675 milking_yield_col="MilkingYield", 676 standard_lc=standard_lc_305, 677 ) 678 if lactation.empty: 679 rows.append({"TestId": test_id, "305_milk_yield": np.nan}) 680 continue 681 682 cumulative_yield = ISLC_original_method( 683 df=lactation, 684 standard_lc=standard_lc_305, 685 correlation_matrix=correlation_matrix, 686 std_per_grid_day=std_per_grid_day, 687 ) 688 rows.append({"TestId": test_id, "LactationMilkYield": cumulative_yield}) 689 690 return pd.DataFrame(rows, columns=pd.Index(["TestId", "LactationMilkYield"]))
Compute estimated 305-day yields using the original grid-based ISLC variant from the original paper by Dr. Wilmink in 1987.
The function groups standardized input by TestId, interpolates to the
fixed GRID_DAYS grid, and applies ISLC_original_method() per lactation.
Arguments:
- df: Input records with DIM and milk yield columns.
- days_in_milk_col: Name of DIM column in
df. - milking_yield_col: Name of milk-yield column in
df. - test_id_col: Name of lactation identifier column in
df. - default_test_id: Fallback TestId when
test_id_colis absent. - standard_lc_305: Standard lactation curve used for interpolation.
- correlation_matrix: Correlation matrix over GRID_DAYS.
- std_per_grid_day: Standard deviations per GRID_DAYS element.
- fit_standard_lc_from_data: If True, refit standard curve and covariance
representation from
reference_dfbefore prediction. - reference_df: Reference population used when
fit_standard_lc_from_data=True. The reference data is standardized toDaysInMilk/MilkingYieldand fit usingsmall_grid=Trueso correlation and standard-deviation assets are aligned toGRID_DAYS.
Returns:
DataFrame with columns
LactationMilkYieldandTestId.
Raises:
- ValueError: If required columns (DaysInMilk or MilkingYield) cannot be
found, or if
fit_standard_lc_from_data=Trueandreference_dfis not provided.
Notes:
Records with DIM > 305 are dropped before interpolation and scoring.
703def interpolation_standard_lc( 704 group: pd.DataFrame, 705 days_in_milk_col: str, 706 milking_yield_col: str, 707 standard_lc: pd.Series | npt.NDArray[np.float64] | None = None, 708 small_grid: bool = False, 709) -> pd.DataFrame: 710 """Interpolate a single lactation onto the DIM grid guided by a standard curve. 711 712 The function interpolates measured yields for one animal onto the 713 predefined grid of DIM values. When a grid day coincides with a measured 714 DIM the observed yield is used; otherwise the interpolation adjusts the 715 linear slope between neighboring measurements by the difference in the 716 standard curve, inspired by the CRV E2 approach. 717 718 The interpolation formula applied for a grid day ``gday`` between two 719 measured points (x1,y1) and (x2,y2) is:: 720 721 slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1) 722 yi = gi + (slope * (gday - x1) + (y1 - g1)) 723 724 Args: 725 group: DataFrame with test-day records for a single animal. Must 726 contain columns named by ``days_in_milk_col`` and 727 ``milking_yield_col``. 728 days_in_milk_col: Name of the DIM column in ``group``. 729 milking_yield_col: Name of the milk-yield column in ``group``. 730 standard_lc: A Series indexed by DIM providing the expected yield on 731 each grid day; used to guide the interpolation shape. 732 small_grid: If True, interpolate on GRID_DAYS only; otherwise include 733 day 0 and day 305 in the interpolation target grid. 734 735 Returns: 736 A DataFrame with one row per interpolated grid day containing the 737 original identifying columns (from the first row of ``group``), plus 738 the columns ``GridDay`` and ``MilkYieldInterp``. 739 740 Notes: 741 - No interpolation is performed outside the range bounded by the 742 first and last measured DIM for the lactation (those grid days are 743 skipped). 744 - No interpolation is performed for grid days where the right 745 neighboring measured DIM is >= 305, because this lies outside the 746 intended standard-curve support. 747 - The implementation defines a local default ``grid`` of 748 [10, 30, ..., 290]. 749 750 751 """ 752 if small_grid: 753 grid = GRID_DAYS 754 else: 755 # add zero and 305 to the grid 756 grid = [0] + GRID_DAYS + [305] 757 758 if standard_lc is None: 759 raise ValueError("A standard lactation curve is required for interpolation_standard_lc.") 760 761 standard_lc_series = _curve_to_series(standard_lc) 762 763 # Sort and ensure unique DIMs 764 group = group.sort_values(days_in_milk_col).drop_duplicates(days_in_milk_col) 765 dims = group[days_in_milk_col].tolist() 766 767 rows = [] 768 769 # loop over the grid days 770 for gday in grid: 771 # --- CASE 1: Exact measured day --------------------------- 772 if gday in dims: 773 y_val = group.loc[group[days_in_milk_col] == gday, milking_yield_col].iloc[0] 774 775 row = group.iloc[0].to_dict() 776 row["GridDay"] = gday 777 row["MilkYieldInterp"] = float(y_val) 778 rows.append(row) 779 continue 780 781 # --- CASE 2: interpolate between measured days ------------- 782 before_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] < gday]) 783 after_df = cast(pd.DataFrame, group.loc[group[days_in_milk_col] > gday]) 784 before = before_df.tail(1) 785 after = after_df.head(1) 786 787 # CASE 3 No interpolation if grid day is outside the range of measured DIM 788 if before.empty or after.empty: 789 continue 790 791 # CASE 4: No interpolation if the day after is day 305 or higher, 792 # as this is outside the range of the standard curve 793 if float(after.iloc[0][days_in_milk_col]) >= 305: 794 continue 795 796 x1 = float(before.iloc[0][days_in_milk_col]) 797 y1 = float(before.iloc[0][milking_yield_col]) 798 x2 = float(after.iloc[0][days_in_milk_col]) 799 y2 = float(after.iloc[0][milking_yield_col]) 800 801 # expected yields from standard lactation curve 802 g1 = _curve_value(standard_lc_series, int(x1)) 803 g2 = _curve_value(standard_lc_series, int(x2)) 804 gi = _curve_value(standard_lc_series, int(gday)) 805 806 # Wilmink-based interpolation formula 807 slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1) 808 yi = gi + (slope * (gday - x1) + (y1 - g1)) 809 810 row = group.iloc[0].to_dict() 811 row["GridDay"] = gday 812 row["MilkYieldInterp"] = float(yi) 813 rows.append(row) 814 815 if not rows: 816 return pd.DataFrame( 817 columns=pd.Index([*group.columns.to_list(), "GridDay", "MilkYieldInterp"]) 818 ) 819 820 return pd.DataFrame(rows)
Interpolate a single lactation onto the DIM grid guided by a standard curve.
The function interpolates measured yields for one animal onto the predefined grid of DIM values. When a grid day coincides with a measured DIM the observed yield is used; otherwise the interpolation adjusts the linear slope between neighboring measurements by the difference in the standard curve, inspired by the CRV E2 approach.
The interpolation formula applied for a grid day gday between two
measured points (x1,y1) and (x2,y2) is::
slope = ((y2 - y1) - (g2 - g1)) / (x2 - x1)
yi = gi + (slope * (gday - x1) + (y1 - g1))
Arguments:
- group: DataFrame with test-day records for a single animal. Must
contain columns named by
days_in_milk_colandmilking_yield_col. - days_in_milk_col: Name of the DIM column in
group. - milking_yield_col: Name of the milk-yield column in
group. - standard_lc: A Series indexed by DIM providing the expected yield on each grid day; used to guide the interpolation shape.
- small_grid: If True, interpolate on GRID_DAYS only; otherwise include day 0 and day 305 in the interpolation target grid.
Returns:
A DataFrame with one row per interpolated grid day containing the original identifying columns (from the first row of
group), plus the columnsGridDayandMilkYieldInterp.
Notes:
- No interpolation is performed outside the range bounded by the first and last measured DIM for the lactation (those grid days are skipped). - No interpolation is performed for grid days where the right neighboring measured DIM is >= 305, because this lies outside the intended standard-curve support.
- The implementation defines a local default
gridof [10, 30, ..., 290].
823def linear_interpd_all_to_grid( 824 group: pd.DataFrame, 825 days_in_milk_col: str, 826 milking_yield_col: str, 827 standard_lc: pd.Series | None = None, 828 small_grid: bool = False, 829) -> pd.DataFrame: 830 """Linearly interpolate all grid days for a lactation. 831 832 This helper uses linear interpolation (with extrapolation) to produce 833 milk-yield values for every grid day regardless of whether the grid day 834 lies between measured observations. 835 836 Args: 837 group: DataFrame containing measured DIM and yield values for one 838 lactation. 839 days_in_milk_col: Name of the DIM column. 840 milking_yield_col: Name of the milk-yield column. 841 small_grid: If True, interpolate on GRID_DAYS only; otherwise include 842 day 0 and day 305. 843 844 Returns: 845 A DataFrame with identifying columns copied from ``group``'s first 846 row and columns ``GridDay`` and ``MilkYieldInterp`` containing the 847 interpolated (or extrapolated) yields for every grid day. 848 """ 849 if small_grid: 850 grid = GRID_DAYS 851 else: 852 # add zero and 305 to the grid 853 grid = [0] + GRID_DAYS + [305] 854 855 group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col) 856 f = interp1d( 857 group[days_in_milk_col], 858 group[milking_yield_col], 859 kind="linear", 860 fill_value=cast(Any, "extrapolate"), 861 ) 862 863 base = { 864 col: group[col].iloc[0] 865 for col in group.columns 866 if col not in [days_in_milk_col, milking_yield_col] 867 } 868 base["GridDay"] = grid 869 base["MilkYieldInterp"] = f(grid) 870 871 return pd.DataFrame(base)
Linearly interpolate all grid days for a lactation.
This helper uses linear interpolation (with extrapolation) to produce milk-yield values for every grid day regardless of whether the grid day lies between measured observations.
Arguments:
- group: DataFrame containing measured DIM and yield values for one lactation.
- days_in_milk_col: Name of the DIM column.
- milking_yield_col: Name of the milk-yield column.
- small_grid: If True, interpolate on GRID_DAYS only; otherwise include day 0 and day 305.
Returns:
A DataFrame with identifying columns copied from
group's first row and columnsGridDayandMilkYieldInterpcontaining the interpolated (or extrapolated) yields for every grid day.
876def linear_interpd_closest_to_grid( 877 group: pd.DataFrame, 878 days_in_milk_col: str, 879 milking_yield_col: str, 880 standard_lc: pd.Series | None = None, 881 small_grid: bool = False, 882) -> pd.DataFrame | None: 883 """Linearly interpolate grid days between measured observations. 884 885 This helper returns interpolated yields only for grid days that lie 886 between the first and last measured DIM for the lactation. If no grid 887 days fall within the measured range the function returns ``None``. 888 889 Args: 890 group: DataFrame for a single lactation. 891 days_in_milk_col: Name of the DIM column. 892 milking_yield_col: Name of the milk-yield column. 893 small_grid: If True, use GRID_DAYS only; otherwise include 894 day 0 and day 305. 895 896 Returns: 897 A DataFrame with identifying columns plus ``GridDay`` and 898 ``MilkYieldInterp``, or ``None`` if there are no grid days within 899 the measured range. 900 """ 901 if small_grid: 902 grid = GRID_DAYS 903 else: 904 # add zero and 305 to the grid 905 grid = [0] + GRID_DAYS + [305] 906 907 group = group.sort_values(days_in_milk_col).drop_duplicates(subset=days_in_milk_col) 908 909 # Find the range of interpolatable DIM 910 min_dim = group[days_in_milk_col].min() 911 max_dim = group[days_in_milk_col].max() 912 grid_in_range = [g for g in grid if min_dim <= g <= max_dim] 913 914 # apply linear interpolation 915 if not grid_in_range: 916 return None 917 f = interp1d( 918 group[days_in_milk_col], 919 group[milking_yield_col], 920 kind="linear", 921 fill_value=cast(Any, np.nan), 922 ) 923 924 # create a new dataframe with the newly created columns 925 base = { 926 col: group[col].iloc[0] 927 for col in group.columns 928 if col not in [days_in_milk_col, milking_yield_col] 929 } 930 base["GridDay"] = grid_in_range 931 base["MilkYieldInterp"] = f(grid_in_range) 932 933 return pd.DataFrame(base)
Linearly interpolate grid days between measured observations.
This helper returns interpolated yields only for grid days that lie
between the first and last measured DIM for the lactation. If no grid
days fall within the measured range the function returns None.
Arguments:
- group: DataFrame for a single lactation.
- days_in_milk_col: Name of the DIM column.
- milking_yield_col: Name of the milk-yield column.
- small_grid: If True, use GRID_DAYS only; otherwise include day 0 and day 305.
Returns:
A DataFrame with identifying columns plus
GridDayandMilkYieldInterp, orNoneif there are no grid days within the measured range.
936def create_standard_lc_representation( 937 df: pd.DataFrame, 938 standard_lactation_curve: pd.Series, 939 days_in_milk_col: str, 940 milking_yield_col: str, 941 interpolation_method: InterpolationMethod = interpolation_standard_lc, 942 lc_model: str = "Wilmink", 943 small_grid: bool = False, 944) -> tuple[pd.DataFrame, npt.NDArray[np.float64], pd.Series]: 945 """Create a standard lactation curve with correlation matrix from data. 946 947 This function estimates the correlation structure and standard deviations 948 of milk yields across grid days, and refits a Wilmink lactation curve model to 949 the interpolated data. The Wilmink model is hereby the default, 950 but any model from the LC package can be used. 951 The outputs can be used as inputs to the 952 ``ISLC_method`` or ``ISLC_original`` functions for predicting 305-d yields. 953 954 The process standardizes yields per animal and computes row-wise 955 (per-animal) statistics, then derives relationships across grid days. 956 957 Args: 958 df: Input DataFrame containing test-day records. Must include a 959 ``TestId`` column to identify unique lactations. Also requires 960 columns specified by ``days_in_milk_col`` and ``milking_yield_col``. 961 standard_lactation_curve: A Series providing a reference lactation 962 curve (e.g., the fitted Wilmink curve on which to base 963 interpolation). 964 days_in_milk_col: Name of the column in ``df`` containing days in milk (DIM). 965 milking_yield_col: Name of the column in ``df`` containing measured 966 milk yields. 967 interpolation_method: An interpolation method conforming to 968 ``InterpolationMethod`` protocol (default: ``interpolation_standard_lc``). 969 Must be callable with signature 970 (group, days_in_milk_col, milking_yield_col, standard_lc). 971 lc_model: The lactation curve model to fit to the interpolated data. Default is "Wilmink". 972 small_grid: If True, build representation on GRID_DAYS only. 973 974 Returns: 975 A tuple of three elements: 976 - corr (pd.DataFrame): Correlation matrix between grid-day yields. 977 - std_per_grid_day (np.ndarray): Standard deviations of standardized 978 yields per grid day. 979 - standard_lactation_curve_grid (pd.Series): Refitted Wilmink model 980 indexed by DIM (1–305). 981 982 Notes: 983 - Expects ``df`` to have a ``TestId`` column to group lactations. 984 - Performs standardization per animal (row-wise), so output 985 correlations and standard deviations reflect between-day variation 986 within standardized animal profiles. 987 - Uses ``fit_lactation_curve`` from the ``lactationcurve`` package 988 with model='Wilmink' and fitting='frequentist'. 989 However other models can be used by specifying the ``lc_model`` argument. 990 """ 991 992 if "TestId" not in df.columns: 993 raise ValueError("DataFrame must contain a 'TestId' column.") 994 if days_in_milk_col not in df.columns or milking_yield_col not in df.columns: 995 raise ValueError(f"Columns '{days_in_milk_col}' and '{milking_yield_col}' must be in df.") 996 997 interpolated_groups: list[pd.DataFrame] = [] 998 for test_id, group in df.groupby("TestId", sort=False): 999 try: 1000 interp_group = interpolation_method( 1001 group, 1002 days_in_milk_col=days_in_milk_col, 1003 milking_yield_col=milking_yield_col, 1004 standard_lc=standard_lactation_curve, 1005 small_grid=small_grid, 1006 ) 1007 except TypeError: 1008 # Backward compatibility for custom interpolation callables 1009 # that do not yet accept small_grid. 1010 interp_group = interpolation_method( 1011 group, 1012 days_in_milk_col=days_in_milk_col, 1013 milking_yield_col=milking_yield_col, 1014 standard_lc=standard_lactation_curve, 1015 ) 1016 if interp_group.empty: 1017 continue 1018 if "TestId" not in interp_group.columns: 1019 interp_group = interp_group.copy() 1020 interp_group["TestId"] = test_id 1021 interpolated_groups.append(interp_group) 1022 1023 if not interpolated_groups: 1024 raise ValueError("No interpolated rows were produced; cannot build representation.") 1025 1026 df_grid = pd.concat(interpolated_groups, ignore_index=True) 1027 1028 # define Znp 1029 # pivot df to make GridDay the columns 1030 # Z should be the unscaled pivot table (TestId × GridDay) 1031 Z = df_grid.pivot(index="TestId", columns="GridDay", values="MilkYieldInterp") 1032 1033 # Standardize records per cow (so each row has mean=0, std=1) 1034 Z_std = (Z.sub(Z.mean(axis=1), axis=0)).div(Z.std(axis=1), axis=0) 1035 1036 # Convert to NumPy array 1037 Znp = Z_std.to_numpy() 1038 1039 # calculate correlation matrix 1040 corr = pd.DataFrame(Znp, columns=Z.columns).corr() 1041 1042 # calculate standard deviations for each grid day from the Znp matrix 1043 std_per_grid_day = np.nanstd(Znp, axis=0) # ignores NaN 1044 1045 # fit Wilmink model to get mean values and std for each cow 1046 standard_lactation_curve_grid = pd.Series( 1047 fit_lactation_curve( 1048 df_grid["GridDay"].values, 1049 df_grid["MilkYieldInterp"].values, 1050 model=lc_model, 1051 fitting="frequentist", 1052 ), 1053 index=range(1, 306), 1054 ) 1055 return corr, std_per_grid_day, standard_lactation_curve_grid
Create a standard lactation curve with correlation matrix from data.
This function estimates the correlation structure and standard deviations
of milk yields across grid days, and refits a Wilmink lactation curve model to
the interpolated data. The Wilmink model is hereby the default,
but any model from the LC package can be used.
The outputs can be used as inputs to the
ISLC_method or ISLC_original functions for predicting 305-d yields.
The process standardizes yields per animal and computes row-wise (per-animal) statistics, then derives relationships across grid days.
Arguments:
- df: Input DataFrame containing test-day records. Must include a
TestIdcolumn to identify unique lactations. Also requires columns specified bydays_in_milk_colandmilking_yield_col. - standard_lactation_curve: A Series providing a reference lactation curve (e.g., the fitted Wilmink curve on which to base interpolation).
- days_in_milk_col: Name of the column in
dfcontaining days in milk (DIM). - milking_yield_col: Name of the column in
dfcontaining measured milk yields. - interpolation_method: An interpolation method conforming to
InterpolationMethodprotocol (default:interpolation_standard_lc). Must be callable with signature (group, days_in_milk_col, milking_yield_col, standard_lc). - lc_model: The lactation curve model to fit to the interpolated data. Default is "Wilmink".
- small_grid: If True, build representation on GRID_DAYS only.
Returns:
A tuple of three elements:
- corr (pd.DataFrame): Correlation matrix between grid-day yields.
- std_per_grid_day (np.ndarray): Standard deviations of standardized yields per grid day.
- standard_lactation_curve_grid (pd.Series): Refitted Wilmink model indexed by DIM (1–305).
Notes:
- Expects
dfto have aTestIdcolumn to group lactations.- Performs standardization per animal (row-wise), so output correlations and standard deviations reflect between-day variation within standardized animal profiles.
- Uses
fit_lactation_curvefrom thelactationcurvepackage with model='Wilmink' and fitting='frequentist'. However other models can be used by specifying thelc_modelargument.