Stat 324 – HW 6
Due by 2pm, Friday, May 15
1) The data in telemarketing.mtw represents data on the time spent on the phone by telemarketers and the number of months of employment (taken from Dielman, 2001). Suppose a division manager wants to examine a relationship betweeen time on the job and number of calls. As time on the job increases, the employee becomes more familiar with the calling systema nd the correct procedures to use on the phone and also begins to acquire more clients.
(a) Examine a scatterplot of calls vs. months. If I was going to try a log transformation, which variable would I choose? Explain.
(b) Show that the log transformation is not sufficient. Include at residual plots and a lack of fit test to support your conclusion.
(c) Fit a regression model of calls on 1/months, and under the Storage button store the Fits. Does this model appear appropriate? Justify your answer.
(d) Use the model in (c) to estimate the number of calls for an employee with 16 months of experience with 95% confidence.
(e) Now fit a quadratic model for predicting calls on months, storing the fits. Is the quadratic term statistically significant?
(f) Compre the quadratic model to a cubic model. In particular, examine the statistical significance of the cubic term, R2adj, and s. Which model would you recommend?
(g) Use the quadratic model to estimate the number of calls for an employee with 16 months experience with 95% confidence. How similar is it to the one in (d)?
(h) On your scatterplot of calls vs. months, right click and choose Add > Calculated Line. Specify the first column of FITS as the Y column and months as the X column. Then add the calculated line using the second column of FITS as the Y column. Do you prefer one model to the other? How does each behave as the number of calls increases – which seems more realistic to you? Is it reasonable to compare the R2 values between the two models?
(i) Now instead create a new variable, months – 16. Fit the quadratic model of calls on months-16 and (months-16)2. Now how would you estimate the average number of calls for all employees with 16 months experience? Construct a 95% confidence interval for this parameter.
Extra credit using Calculus ideas: Suggest a way using the original quadratic model to obtain a (point) estimate the months for which the number of class is maximized.
2) The data in SkiSales.mtw represents ski sales (in millions of dollars) and the personal disposable income for the same period (PDF, a leading economic indicator) for 10 years (taken from Chatterjee and Hadi, 2006).
(a) Regress sales on PDI and store the residuals. Report the R2 value and the esitmate for the increase in sales for each additional dollar unit of PDI. Does the Durbin-Watson statistic indicate any autocorrelation?
(b) Examine the residual plots, in particular residuals vs. observation order. Also look at the residuals you stored when fitting the model. Describe a pattern you see in the residuals and how it relates to this context.
(c) Create an indicator variable called season that equals one for quarters 2 and 3 (the cold weather quarters) and equals zero for quarters 1 and 4 (the warm weather quarters). Add this variable to the model. Interpret the resulting model coefficients and evaluate the residual plots. Is this variable statistically significant (state hypotheses, test statistic, and p-value). Has this appeared to fixed the autocorrelation problem?
Hint:
MTB
> set c6
DATA>
10(1 0 0 1)
DATA>
end
(d) Produce the coded scatterplot incorporating this seasonal variable. Based on this graph, does there appear to be an interaction between PDI and season? Explain.
3) Exercise 5.6 (p. 164)
Remember to state your hypotheses, test statistic, and p-value.
4) Exercise 5.13 (p. 166-7) but answer the following questions:
(a) Produce the coded scatterplot as discussed and fit the “separate lines” model. Write out the separate equations to predict height from age for each diet. Interpret the coefficient of the interaction term and discuss what it means to have an interaction in this context.
(b) Compare the “separate lines” model to a “coincident lines” model using a partial F test. (This corresponds to question b in the text). Be sure to state the relevant Ho and Ha.
(c) Also evaluate a “concurrent lines” model. Both by fitting the model and evaluating it (state Ho and Ha to compare this model to the separate lines model) and also by considering the implications of this model in this context.
(d) Create a new column by substracting one from the value in the diet column to produce a 0/1 indicator variable. Compare the separate lines model produced here to that in (a). In particular, what are the resulting separate lines that are estimated vs. those in (a)?
5) Many large corporations and government agencies administer a preemployment test in an attempt to screen job applicants. The test is supposed to measure an applicant’s aptitude for the job and the results are used as part of the information for making a hiring decision. The federal government has ruled (Tower amendment to Title VII, Civil Rights Act of 1964) that these tests (1) must measure abilities that are directly related to the job under consideration and (2) must not discriminate on the basis of race or national origin.
Data were collected using a special employment program. Twenty applicants were hired on a trial basis for six weeks. One week was spent in a training class. The remaining five weeks were spent on the job. The participants were selected from a pool of applicants by a method that was not related to the preemployment test scores. A test was given at the end of the training period and a work performance evaluation was developed at the end of the six-week period. These two scores were combined to form an index of job performance. The data in preemp.mtw concern race represented as two groups (white =0 and minority=1). The goal is to determine whether there are two distinct relationships or whether the relationship is the same for both groups (Source: Chatterjee and Hadi, 2006).
(a) Create a coded scatterplot of job performance vs. test score, using different symbols for the two minority groups. Describe the relationship between job performance and the preemployment test scores and whether visually the relationship seems to differ for the two ethnic groups.
(b) Fit a “parallel lines” regression model. Is there evidence that the mean performance score for whites differs from the mean performance score for minorities? (carry out a test, stating Ho and Ha)
(c) Fit the “separate lines” regression model. Is there evidence that the effect of test
score depends on the ethnic group? (carry out a test, stating Ho and Ha)
(d) If we go with the separate lines model, how do the slopes and intercepts differ between whites and minorities (which is bigger)? Which line below represents the whites, which the minorities, and which the line that fails to distinguish between the two groups?

(e) Suppose we want employees to achieve a certain threshold in job performance (Y*).

- Do an informal reverse prediction to determine how the cut-off values for the preemployment test scores compare between the whites and the minorities and the “pooled” line.
- If we use the cutoff for the “pooled” line for everyone, explain how this relates to issues of discrimination.