Comments on HW 8

 

Problem 2

(a) Remember whenever you describe a scatterplot, association between two quantitative variables, to touch on at least direction, form, and strength.

(c) Whenever you report a (sample) regression line, use  instead of y in the equation.  Also, when you interpret slope, it is the predicted change in the response for each one unit increase in the explanatory variable.

 

Problem 3

When working with two quantitative variable, the appropriate graphical summary is a scatterplot (which is discussed based on direction, form and strength) and the appropriate numerical summary is just r, the correlation coefficient, which tells you how strong the linear relationship is.  Many of you jumped into inference, but this problem was all “descriptive.”

 

Problem 4

(b) When checking the technical conditions, make sure you clarify which graph you are looking at each time and exactly what you are seeing in the graph to help you come to your conclusion.

(c) Report both the test statistic and p-value so it’s completely clear to me which p-value (from the slope coefficient row not the intercept row!) you are using.

(d) There was nothing in the research questions stating a “one-sided” alternative was requested.

(e) Make sure your explanations of a confounding variable tie it to both the explanatory and the response.

 

Problem 5

Here are the parts of the SAS output that are most relevant to us.  You might want to review p. 391-4 which discusses sample computer printouts)

 

                                                            Sum of                                 Mean

Source      DF          Squares           Square            F value           Prob > F

Model       2           1825.969          912.985           31.191            0.0001

Error       20          585.424           29.271           

C Total     22          2411.393         

 

Root MSE  5.410         R-square  0.7572

 

                        Parameter         Standard          T for H0:

Variable                Estimate          Error             Parameter =0      Prob > |T|

INTERCEP                61.713            5.2453            11.765            0.0001

ECON                    -0.171            0.0640            -2.676            0.0145

LITER                   -0.404            0.0720            -5.616            0.0001

 

(iii) R2 = .7572 or 75.72%

(x) t for H0: b1 = 0

b­1 refers to the population slope of the econ variable.  This is found in the second row of the Variable table.  The parameter estimate for b1 is b = -0.171.  The standard error of this estimate (a measure of the sampling variability of this sample slope from repeated random samples) is .0640.  So the t statistic is -.171/.0640 = -2.676.  The first row corresponding to INTERCEP is for the intercept or constant term (a).

(xi) Following along the same row, the p-value for this test is .0145.  Both SAS and Minitab automatically report a two-sided p-value.  Since Ha: b1 ≠ 0, this is what we want.  The phrase “Prob > |T|” is intended to convey that they found both tail probabilities and summed them together.

(x) Now we want the one-sided p-value.  Since the value of b1 was negative, the observed value is in the direction conjectured by the alternative hypothesis so we just have to divide the p-value give to use by the program by 2.

(xiii) Now we need to look at the overall F or model utility test above.  SAS reports F = 31.191

(xiv) and the corresponding p-value is .0001.  We never worry about dividing this p-value in half or anything.

 

The conclusion that we draw from the F test is, since the p-value is small, to reject H0: b1 = b2 = 0 in favor of Ha: at least one bi ≠ 0.  So we just say that at least one of these variables (econ and liter) are helpful in predicting the response variable.

The t test for econ has a p-value of .0145 < .05 so we will conclude that b1 ≠ 0, indicating that econ is a statistically significant predictor of birth rate, even with liter in the model.  Similarly, liter is a significant predictor of birth rate even after adjusting for econ in the model.

 

From the SAS output, we can reproduce the regression equation using the given parameter estimates: predicted birth rate = 61.713 - .171 econ - .404 liter.

61.713 represents the predicted birth rate if econ = 0 and liter = 0

-.171 represents the change in predicted birth rate if econ increases by 1 and liter is held constant (so comparing countries with the same literacy rate but where the women’s economic activity differs by one unit.

-.404 represents the change in predicted birth rate if liter increases by 1 and econ is held constant

 

Note, the SAS output also provides the summary statistics for the individual variables and the pairwise correlation coefficients.  The correlation coefficients just look at two variables at a time (e.g., births and econ, r = -.61181, p-value = .0019) but these correlations are not “adjusted” for the other variables in the model and that is why in multiple regression the p-values for r may not match the p-values for the t-statistics for the slopes.