Lecture 4
2025-02-10
Year
and Value
Year
A regression model is a function that describes the relationship between the outcome, \(Y\), and the predictor, \(X\).
\[\begin{aligned} Y &= \color{black}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{black}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{black}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned}\]
\[ \begin{aligned} Y &= \color{#325b74}{\textbf{Model}} + \text{Error} \\[8pt] &= \color{#325b74}{\mathbf{f(X)}} + \epsilon \\[8pt] &= \color{#325b74}{\boldsymbol{\mu_{Y|X}}} + \epsilon \end{aligned} \]
Use simple linear regression to model the relationship between a quantitative outcome (\(Y\)) and a single quantitative predictor (\(X\)): \[\Large{Y = \beta_0 + \beta_1 X + \epsilon}\]
\[\Large{\hat{Y} = b_0 + b_1 X}\]
\[\text{residual} = \text{observed} - \text{predicted} = y - \hat{y}\]
\[e_i = \text{observed} - \text{predicted} = y_i - \hat{y}_i\]
\[e^2_1 + e^2_2 + \dots + e^2_n\]
# A tibble: 2 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) -16.1 0.510 -31.7 3.85e-39
2 Year 0.00879 0.000256 34.3 4.03e-41
The regression line goes through the center of mass point (the coordinates corresponding to average \(X\) and average \(Y\)): \(b_0 = \bar{Y} - b_1~\bar{X}\)
Slope has the same sign as the correlation coefficient: \(b_1 = r \frac{s_Y}{s_X}\)
Sum of the residuals is zero: \(\sum_{i = 1}^n \epsilon_i = 0\)
Residuals and \(X\) values are uncorrelated
\[\widehat{\text{Total number of cows}} = -16.1 + 0.008785434 \times \text{Year}\]
Year
, we expect the total number of cows to be higher by 0.008785434 points, on average.Year
is 0, we expect the total number of cows to be -16.1.✅ The intercept is meaningful in context of the data if
🛑 Otherwise, it might not be meaningful!