Stat 322 – HW 6

Due Friday, Feb. 23

 

Remember to include your Minitab output.

 

1) Parents of children who speak at a young age like to believe that this bodes well for the child exhibiting high intelligence later in life. To investigate this possibility, researchers collected data on the age of first speaking (in months) and score on the Gesell aptitude test taken later in life for a sample of 22 children. The data can be found in gesell.mtw.

(a) Produce and describe (direction, strength, and form) a scatterplot of Gesell score vs. age of first speaking.

(b) Determine the regression equation for predicting a child’s Gesell score from the age at which he/she first speaks. Report the equation, along with the value of R2, and superimpose the line on the scatterplot. Provide an interpretation for the R2 value.

(c) Provide interpretations in context of the estimated slope and intercept coefficients.

(d) Do any of the children appear to be outliers in the age variable? If so, what is the ID number for this child? How long did it take him/her to speak? Also report the residual value for this child, and comment on whether it is exceptionally large (in absolute value) compared to other residual values.

(e) Remove this child from the analysis. Then reproduce a scatterplot and recalculate the regression equation and value of R2. Comment on how these have changed.

(f) Now also remove the child who took the next longest time to speak, again look at a scatterplot, and the regression equation and value of R2. Comment again on how these have changed.

(g) Write a paragraph explaining (as if to someone with no formal knowledge of statistics) why these summary statistics have changed so much and summarizing what these data reveal concerning the relationship between age of first speaking and aptitude for children.

 

2) problem 20 (p. 518)

 

3) The file TVlife.mtw lists the life expectancy and the number of people per television set in a sample of 22 countries.

(a) Produce and describe a scatterplot of life expectancy vs. people per television set.

(b) Take a log transformation of the people per TV variable.  Would it be appropriate to use the regression model with life expectancy and log(people per television)? (Discuss the residual plots.)

(c) Is the relationship between life expectancy and log(people per television) statistically significant? (State hypotheses and report test statistic, p-value, decision, and conclusion).

(d) Since the association is so strongly negative, one might conclude that simply sending television sets to the countries with lower life expectancies would cause their inhabitants to live longer.  Comment on this argument.

 

4) problem 73 (p. 551)

 

5) The data in mammals.mtw report the average gestation period (in days) and the average longevity (in years) for a variety of mammals. 

(a) Produce and discuss a scatterplot of gestation period vs. longevity. 

(b) Determine the regression equation for predicting gestation period from longevity (Fitted Line Plot). Comment on how well the line seems to describe the relationship between the variables.

(c) Conduct a residual analysis to investigate whether the assumptions of the regression model are satisfied here.  Comment on your findings.

(d) Take the logarithm (base 10) of each variable, and examine a scatterplot of log(gestation) vs. log(longevity).  Does the relationship appear to be roughly linear?

(e) Determine the regression equation for predicting log(gestation) from log(longevity).  Also report the value of R2.  Then conduct a residual analysis, and comment on your findings.

(f) Use this model to form a 95% prediction interval for the gestation period for a species of mammal whose longevity is 12 years.  [Hint: First find this prediction interval for log(gestation) using “Options” under Stat > Regression > Regression, then “back-transform” to find the interval for gestation.]

(g) Use this model to predict the gestation period for a mammal whose longevity is 20 years.  How does this interval compare (center and width) to the previous one?

(h) Use this model to predict (with a point estimate and with a 95% prediction interval) the gestation period for the human species of mammal, whose longevity is about 75 years.  Is your prediction reasonable?  Explain.