Stat 324 – HW 3
Due by 2pm Friday, April 17
1) Exercise 2.13 (p. 59) (data: http://www.biz.uiowa.edu/faculty/jledolter/RegressionModeling/)
2) Exercise 2.20 (p.
61-62) Use the data and background
description of the study as in the text, but answer my questions below instead
of the ones in the book.
(a) First, I am going to ask you to type in the data, so I can make sure you can seen this trick:
Choose Calc > Make Patterned Data > Arbitrary Set of Numbers

Indicate that you want to store the data in C1 and then enter the 4 temperatures set by the engineers in the “Arbitrary set of numbers” box. Indicate that you want to repeat each value 6 times but that you want to repeat the sequence just once.

Now in C2 enter the lifetimes corresponding to the temperatures.
Make a screen capture (e.g., use the Prnt Scrn button on your keyboard) of the resulting data, at least the first 10 rows, so I can see how it was entered, and include a copy in your write-up.
(b) Produce a scatterplot and describe the direction, strength, and form. Note: You may want to add a smoother to the scatterplot (you can right click on the plot and choose Add > Smoother)
(c) Produce the four-in-one residual plots and comment on each basic regression model condition (LINE) and whether or not they believe they are satisfied for these data.
(d) Use Minitab to carry out a Lack of Fit test. State the hypotheses, test statistic, and p-value (include output). What is your conclusion? Is this conclusion consistent with what you viewed in the scatterplot and residual plots? Explain.
(e) Transform the lifetimes to ln(lifetime) and reproduce the scatterplot and residual plots. Describe the direction, strength, and form of the scatterplot and comment as in (c) on each of the basic regression model conditions.
Hints: If you use Calc > Calculator to create the new column, you can use the Functions pull-down menu to select “natural log” instead of “log (base 10)”. You can also type directly at the MTB prompt:
MTB> let c3=logten(c2) to get log base 10
MTB> let c3=ln(c2) all of the next three lines
give you the natural log
MTB> let c3=loge(c2)
MTB> let c3=ln(c2)
You will get the same “effect” from using logten or ln but you need to be consistent when you back-transform etc.
(f) Carry out the lack of fit test for the transformed data, include the output and state your conclusion.
(g) If we were willing to perform inference with this model, would the relationship between ln(lifetime) and temperature (and therefore lifetime and temperature) be considered statistically significant? (What is your supporting evidence?)
(h) Is this study an observational study or an experiment? If the relationship turns out to be statistically significant, would you be willing to say temperature affects lifetime of this type of heater?
(i) Are you willing to generalize your conclusions to all heaters made by the current production process? Why or why not?
While the next two problems discuss transformations, you should be able to complete much of it before Wednesday based on producing and interpreting residual plots. We will officially discuss transformations Wednesday and Thursday.
3) The data in stopping.mtw shows the stopping distances (feet) for cars traveling at the indicated speeds (miles per hour) (Snee, 1986). Find an appropriate model (linearity, normality, equal variance) that explains the stopping distance in terms of the traveled speed by doing the following:
(a) Comment on which model assumption(s) appear violated based on the scatterplot. How does it suggest changing the power on the response variable and/or the explanatory variable? Also comment on what is revealed by the residual plots and a lack of fit test (yes, there will be a fair bit of output for you to include!).
(b) Comment on which model assumption(s) appear violated (and how it compares to the previous model) based on a scatterplot of log(distance) vs. speed. (You can use either log10 or ln but be clear.) How does it suggest changing the power on the response and/or the explanatory variable? Also comment on what is revealed by the residual plots and a lack of fit test.
(c) Repeat (b) for a scatterplot of log(distance) vs. log(speed).
(d) Repeat (c) for a scatterplot of sqrt(distance) vs. speed. Does this model appear reasonable?
(e) Summarize how this analysis proceeded from one transformation to the next and how this sequence was suggested by the scatterplots and residual analyses. In other words, I stepped you through it this time, how could you have gotten to this point yourself in the future?! Why did what I learned at each step make my next step a logical one to try?
4) Biologists have noticed a consistent relation between the area of islands and the number of animal and plant species living on them. If S is the number of species and A is the area, then S ≈ CAg where C is a constant and g is a biologically meaningful parameter that depends on the group of organisms (birds or grasses for example). Estimates of this relationship are useful in conservation biology for predicting species extinction rates due to diminishing habitat. Example data are in islands.mtw for the number of reptile and amphibian species and the island areas for seven islands in the West Indies (Wilson et al, 1992). The goal is to estimate g.
(a) Discuss a least one reason why you might consider a log-log model for these data. Hint: If the proposed model is correct, what will be the form of the log-log model?
(b) Is a log-log model appropriate for these data? Include and discuss in detail residual plots to support your conclusion.
(c) Provide an interpretation of the slope coefficient of the log-log model in this context in terms of doubling the island area (to stay consistent with the explanatory variable range.)
(d) Summarize the conclusions you would draw from this study, including:
- is the relationship statistically significant (in a way that makes biological sense)?
- can a cause and effect relationship between drawn between these variables?
- can these results be generalized to islands in other parts of the world?