Research Question: Is the adult height of a student determined by the height of the mother? In other words, what is the relationship between a student’s height and mother’s height for all BYU students?
Population: All BYU students.
Parameter of Interest:
Sample: A convenience sample of 1727 BYU students who are in Stat 121.
Are there any issues with this study setup?
Research Question: Is the adult height of a student determined by the height of the mother? In other words, what is the relationship between a student’s height and mother’s height for all BYU students?
Our model: \(y_i = \beta_0 + \beta_1 x_i + \epsilon_i\).
Considering the research question, What would it mean if \(\beta_1 = 0\)?
Our model: \(y_i = \beta_0 + \beta_1x_i + \epsilon_i\)
Our fitted model: \(\hat{y} = 35.653 + 0.503\times x\)
So, doesn’t this mean that \(\beta_1 \neq 0\) because \(\hat{\beta}_1 = 0.503\)?
Research Question: Does mother’s height impact a child’s height?
Steps of hypothesis testing:
Knowing what we did with other hypothesis tests, select the correct hypotheses \[ \begin{align} H_0: & \\ H_a: & \end{align} \]
Knowing what we did with other hypothesis tests, select the correct hypotheses \[ \begin{align} H_0: & \\ H_a: & \end{align} \]
Step 2 - Compare our data result with what we expect to see if the null hypothesis is true.
From our sample, we have \(\hat{\beta}_1 = 0.503\) is this “different enough” from \(0\) to conclude that \(H_a: \beta_1 \neq 0\)?
Step 2 - Compare our data result with what we expect to see if the null hypothesis is true.
From our sample, we have \(\hat{\beta}_1 = 0.503\) is this “different enough” from \(0\) to conclude that \(H_a: \beta_1 \neq 0\)?
First, standardize using the formula (or let the computer do this for you): \[ t = \frac{\hat{\beta}_1 - \overbrace{\beta_1}^{0}}{\frac{\hat{\sigma}}{\sum_{i=1}^n (x_i-\bar{x})^2}} = 15.914 \] Interpret \(t\) as the number of standard errors our \(\hat{\beta}_1\) is from the hypothesized \(\beta_1\).
Is \(t = 15.914\) “different enough” from the null hypothesis to make us think the null is wrong?
What statistical tool do we use to assess this question?
Is \(t = 15.914\) “different enough” from the null hypothesis to make us think the null is wrong?
What statistical tool do we use to assess this question?
If the LINE assumptions of the regression model are appropriate, then \[ t = \frac{\hat{\beta}_1 - \overbrace{\beta_1}^{0}}{\frac{\hat{\sigma}}{\sum_{i=1}^n (x_i-\bar{x})^2}} \] is a standardized statistic and follows \(t\) distribution with center \(0\) and spread \(1\) and degrees of freedom \(n-2\).
Note, above we would set \(\beta_1 = 0\) because we assume \(H_0\) is true unless proven otherwise.
IF the LINE assumptions holds, the talues of \(t\) that are consistent with the claim \(H_0: \beta_1 = 0\) are given by the distribution (curve):
Reminder, the LINE assumptions are:
How would we see if there is a linear relationship between \(x\) and \(y\)?
Is this (approximately) linear for the bulk of the data?
How would we see if there is independence? In other words, how can we “check” if one observation doesn’t influence another?
How would we see if the residuals are normal?
How would we see if the residuals are normal?
Is this approximately normal?
How would we see if there is “equal spread” of the residuals about the fitted line?
How would we see if there is “equal spread” of the residuals about the fitted line?
How would we see if there is “equal spread” of the residuals about the fitted line?
Is this roughly “equal spread”?
Examples of NOT equal spread
Melanoma is highly related to sun exposure. Hence, areas with greater sun have a greater risk of melanoma.
Use the analysis tool to check the following assumptions for the Melanoma example. Which of the following assumptions hold (check all that apply):
Use the analysis tool to check the following assumptions for the Melanoma example. Which of the following assumptions hold (check all that apply):
Back to Step 2 - - gather the data and see if our sample data matches (or doesn’t match) the null hypothesis (note: do this only if LINE assumptions are valid)
Measuring if our data is consistent with the null hypothesis:
Step 3: Draw a conclusions about \(H_0: \beta_1=0\). Using \(\alpha = 0.05\), what do we conclude about \(\beta_1\)?
If we reject \(H_0: \beta_1 = 0\) and conclude \(H_A: \beta_1 \neq 0\) then we really haven’t concluded anything other than there is an effect.
Using the same ideas for building a confidence interval as before, a C% confidence interval for \(\beta_1\) is: \[ \hat{\beta}_1 \pm t^\star\frac{\hat{\sigma}}{\sum_{i=1}^n (x_i-\bar{x})^2} \]
Research Question: As the mother’s height increases, what happens to the child’s height?
Answer:
A 95% confidence interval for \(\beta_1\) is calculated as (0.441,0.565).
How do we interpret this interval?
Research Question: If the mother’s height goes up by 1 inch, can we expect the student’s height to change by 1in?
Answer:
Suppose that someone wanted to use longitude instead of latitude to explain Melanoma mortality. Assuming all the LINE assumptions hold, can we conclude that longitude has a linear effect on melanoma mortality?
Suppose that someone wanted to use longitude instead of latitude to explain Melanoma mortality. Assuming all the LINE assumptions hold, can we conclude that longitude has a linear effect on melanoma mortality?
What do we do if the LINE assumptions aren’t quite appropriate?
Measuring possum head size can be difficult. However, measuring total possum length is easier. What is the relationship between possum length and head size? Use a simple linear regression model (and the course app) to answer the following questions:
Do the LINE assumptions all hold for this example?
Does total length have a linear effect on head length?
What would a Type 1 Error be for the hypothesis test in #1?
If the total length goes up by 1, how much do we expect the head length to change?