Comments on HW 8
Problem 2
(a) Remember
whenever you describe a scatterplot, association between two quantitative
variables, to touch on at least direction, form, and strength.
(c) Whenever you
report a (sample) regression line, use
instead of y in the equation. Also, when you interpret slope, it is the predicted change in the response for
each one unit increase in the explanatory variable.
Problem 3
When working with
two quantitative variable, the appropriate graphical summary is a scatterplot
(which is discussed based on direction, form and strength) and the appropriate
numerical summary is just r, the
correlation coefficient, which tells you how strong the linear relationship is.
Many of you jumped into inference, but
this problem was all “descriptive.”
Problem 4
(b) When checking
the technical conditions, make sure you clarify which graph you are looking at
each time and exactly what you are seeing in the graph to help you come to your
conclusion.
(c) Report both the
test statistic and p-value so it’s completely clear to me which p-value (from
the slope coefficient row not the intercept row!) you are using.
(d) There was
nothing in the research questions stating a “one-sided” alternative was
requested.
(e) Make sure your
explanations of a confounding variable tie it to both the explanatory and the
response.
Problem 5
Here are the parts of the SAS output that are most relevant to us. You might want to review p. 391-4 which discusses sample computer printouts)
Sum of Mean
Source DF Squares Square F value Prob > F
Model 2 1825.969 912.985 31.191 0.0001
Error 20 585.424 29.271
C Total 22 2411.393
Root MSE 5.410 R-square 0.7572
Parameter Standard T for H0:
Variable Estimate Error Parameter =0 Prob > |T|
INTERCEP 61.713 5.2453 11.765 0.0001
ECON -0.171 0.0640 -2.676 0.0145
LITER -0.404 0.0720 -5.616 0.0001
(iii) R2 = .7572 or 75.72%
(x) t for H0: b1 = 0
b1 refers to the population slope of the econ variable. This is found in the second row of the Variable table. The parameter estimate for b1 is b = -0.171. The standard error of this estimate (a measure of the sampling variability of this sample slope from repeated random samples) is .0640. So the t statistic is -.171/.0640 = -2.676. The first row corresponding to INTERCEP is for the intercept or constant term (a).
(xi) Following along the same row, the p-value for this test is .0145. Both SAS and Minitab automatically report a two-sided p-value. Since Ha: b1 ≠ 0, this is what we want. The phrase “Prob > |T|” is intended to convey that they found both tail probabilities and summed them together.
(x) Now we want the one-sided p-value. Since the value of b1 was negative, the observed value is in the direction conjectured by the alternative hypothesis so we just have to divide the p-value give to use by the program by 2.
(xiii) Now we need to look at the overall F or model utility test above. SAS reports F = 31.191
(xiv) and the corresponding p-value is .0001. We never worry about dividing this p-value in half or anything.
The conclusion that we draw from the F test is, since the p-value is small, to reject H0: b1 = b2 = 0 in favor of Ha: at least one bi ≠ 0. So we just say that at least one of these variables (econ and liter) are helpful in predicting the response variable.
The t test for econ has a p-value of .0145 < .05 so we will conclude that b1 ≠ 0, indicating that econ is a statistically significant predictor of birth rate, even with liter in the model. Similarly, liter is a significant predictor of birth rate even after adjusting for econ in the model.
From the SAS output, we can reproduce the regression equation using the given parameter estimates: predicted birth rate = 61.713 - .171 econ - .404 liter.
61.713 represents the predicted birth rate if econ = 0 and liter = 0
-.171 represents the change in predicted birth rate if econ increases by 1 and liter is held constant (so comparing countries with the same literacy rate but where the women’s economic activity differs by one unit.
-.404 represents the change in predicted birth rate if liter increases by 1 and econ is held constant
Note, the SAS output also provides the summary statistics for the individual variables and the pairwise correlation coefficients. The correlation coefficients just look at two variables at a time (e.g., births and econ, r = -.61181, p-value = .0019) but these correlations are not “adjusted” for the other variables in the model and that is why in multiple regression the p-values for r may not match the p-values for the t-statistics for the slopes.