Stat 324 – HW 7

Due Tuesday, June 2

 

1) The data in challenger.mtw are from the Presidential Commission on the Space Shuttle Challenger Accident (1986).  They are the 23 space shuttle flights prior to the launch of the Challenger shuttle (with recoverable data).   The second column indicates the temperature at the time of launch and the third column indicates whether or not there was damage to the O-ring seals.

(a) Produce dotplots of the temperatures for those flights with damage vs. those flights without damage (on the same scale).  Does there appear to be a difference in these two groups?

(b) Fit a logistic regression model (Stat > Regression > Binary Logistic Regression) and provide an interpretation of both of the coefficients in context.

(c) What are the predicted odds of O-ring failure at 750? Predicted probability?

(d) What are the predicted odds of O-ring failure at 310 (the launch temperature Jan. 20, 1986)? Predicted probability?  Any cautions about making this prediction?

(e) Calculate and interpret the odds ratio of O-ring failure between 310  and 750.  Include a 95% confidence interval for this odds-ratio in your interpretation.

(f) To get some idea why NASA made the decision to launch the Challenger at 310. Create a coded scatterplot graphing the number of failures (out of six) versus the temperature, using the O-ring failure? variable as the categorical variable.  Do you see a relationship between O-ring failure and temperature?  What if you ignored all of the launches that had 0 failures?

 

2) Exercise 11.2 (p. 379)

Hints: You are going to have to do some work in (a). I think the easiest approach might be:

Cross tabulate c3 and c5 (e.g., table c3 c5) and then copy this table into empty columns of a worksheet. Then use Minitab to compute the proportion of 1’s over the total number of observations at each incentives value.  Then plot these proportions vs. the incentives variable.  Then repeat this for the other explanatory variables.   If you come up with a slicker way of doing this, please email the course listserv!

 

3) The file DeathPenalty.mtw provides information on 362 death penalty cases including the outcome the case (death penalty or not), the race of the victim (white/black), and the aggravation level of the crime.  The cases with the lowest aggravation level (level 1) involve bar room brawls, liquor-induced arguments, and lovers’ quarrels.  Level 6 comprises the most vicious, cruel, cold-blooded, unprovoked crimes.  Notice that the data file just gives the 12 different explanatory variable combinations (m = 12), the number of cases for each combination and how many of those cases resulted in death penalties.

(a) Choose Stat > Regression > Binary Logistic Regression and, using the second row, specify C4 as the column indicating the number of successes for that combination and C3 as indicating the number of observations at each combination.  Specify both C1 and C2 in both the model and the factors boxes, and under Storage, store the “event probability” estimates.

 

 (both c1 and c2)


 

Does this model appear useful?  Which variables appear significant? Does the model appear to be appropriate?

(b) Now fit the model that treats the aggravation of the crime as a quantitative variable (again store the estimated probabilities).  How do you interpret the coefficient?  What is the downside to this model?  Does there appear to be much difference in the deviance measures?  How do the percentages of concordant pairs compare?

(c) Graph (Scatterplot With Connect and Groups) the estimated probabilities for the two models vs. the aggravation index, coded by race.  Compare the behavior of the two models.

 

 

(d) Pick your favorite of the two models and produce diagnostic graphs. Do any constellations stand out as being unusual/influential?  How does the observed proportion given the death penalty compare to predicted?  What does this indicate?

 

4) A study was conducted to investigate new automobile purchases.  A random sample of 20 families was selected and each family surveyed to determine the age of their oldest vehicle and their total family income.  A follow-up survey was conducted 6 months later to determine whether they had actually purchased a new vehicle during that time.  Data are in newcar.mtw with new car? = 1 when a car was purchased.

(a) Fit a logistic regression model using age and income and use “model deviance” to decide whether the model is adequate.

(b) Interpret the model coefficients (3 of them) in context.

(c) What is the estimated probability that a family with an income of $45,000 and a car that is 5 years old will purchase a new vehicle in the next 6 months?

(d) Expand the model to include an interaction term between age and income.  Is there any evidence that this term is required?  Explain, in context, what it would mean for there to be an interaction between income and age in this context.

(e) Expand the model to include a quadratic term on age.  Is there any evidence that this term is required?  Explain, in context, what it would mean for there to be a quadratic term for age in this context.

(f) Fit your final choice for a logistic regression model and store the estimated probabilities.  Are all values reported?  If not, you need to find the explanatory variable combination and copy the calculated probability into the corresponding cell(s) or calculate the estimated probabilities from the regression equation yourself. Then create a prediction table and calculate the classification rate.  Is the classification rate higher than the proportion of successes observed (if I just blindly predict everyone will buy a car, I would have been correct this percentage of the time!)?

 

5) Early in the production of the Ford Explorer SUV, concern arose over a potential accident risk associated with the tires when the vehicle was carrying heavy loads.  The risk was thought to be acceptable if a low tire pressure was recommended.  The problem was apparently exacerbated by a particular type of Firestone tire that was overly prone to tread separation, especially in warm temperatures.  This type of tire was commonly used on 1995 and later model years.  By the end of 1999, more than 30 lawsuits had been filed over accidents thought to be associated with this problem.  U.S. federal data on fatal car accidents was examined, showing that the odds of a fatal accident being associated with tire failure were three times as great for Explorers as for other sports utility vehicles.  The data in FordExplorers.mtw gives data on 1995 and later model year compact SUVs involved in fatal accidents in the US between 1995 and 1999, excluding those struck by another car or involving alcohol (from National Highway Traffic Safety Administration, Fatality Analysis Reporting System, www-fars.nhtsa.dot.gov).  It is of interest to see whether the odds that a fatal accident is tire-related depend on whether the vehicle is a Ford, after accounting for age of the car and number of passengers.  Since the Ford tire problem may be due to the load carried, there is some interest in seeing whether the odds associated with a Ford depend on the number of passengers.  Notes: presumably, older tires are more likely to fail than newer ones.  Although tire age is not available, vehicle age is an approximate substitute for it.  Since many car owners replace their tires after the car is 3 to 5 years old, however, we may expect the odds of tire failure to increase with age up to some number of years, and then perhaps decrease after that.

(a) Is there evidence that the odds that a fatal accident was due to tire failure was greater if the SUV was a Ford than if it was another SUV, after adjusting for age and number of passengers?  [Hint: What term is the last sentence asking you to create and include in the model?!]

(b) Is there evidence that the Ford effect increased with increasing number of passengers? [Hint: Add a new term to the model to address this question and assess its significance to address this question]

(c) Write out the equation for the predicted log odds for the model in (b) for each value of passenger (at least through passenger = 4). What is the odds-ratio of fatal accident if the SUV was a Ford in each case?  By what factor is this odds ratio increasing for each additional passenger?  How does this relate to one of the parameters in your model?  Sketch a graph of the estimated probability of accident vs. age, showing separate models for the number of passengers (I’m assuming by hand but if you can get Minitab to do it, even better), reflecting the equations you have written.