Stat 322 – HW 3

Due Friday, Jan. 26

 

1) Researchers studied the behavior of drivers on a rural interstate highway in Maryland where the speed limit was 55 miles per hour. They measured speed with an electronic device hidden in the pavement and, to eliminate large trucks, considered only vehicles less than 20 feet long. Suppose that the researchers want to test whether their sample data suggest that the proportion of speeders in the population differs from one-half.

(a) Specify the null and alternative hypotheses. Is this a one-sided or a two-sided test?

(b) The researchers found that 5690 of 12,931 vehicles in their sample were exceeding the speed

limit. Calculate an appropriate test statistic and p-value.  Show your work/include output.

(c) Are the technical conditions for this procedure satisfied?

(d) Would you reject H0 at the a=.01 significance level? How about at the a=.0001 significance

level? Would you say that the data provide very strong evidence against H0? Explain.

(e) Does the test result say anything about how much the proportion of speeders in the

population differs from one-half?

(f) Determine and interpret a 99% confidence interval for the proportion of speeders in the population.

(g) Explain why it was important for the design of the study that the device measuring speed was

hidden.

(h) Would you generalize the results of this study to all drivers on all roads in the U.S.? Explain

briefly.

 

2) Problem 38 (p. 343-4), parts a-c

 

3) Consider four samples of hypothetical sleeping times. 

Sample number

Sample size

Sample mean

Sample std. dev.

1

10

6.6

.825

2

10

6.6

1.597

3

30

6.6

.825

4

30

6.6

1.597

(a) Between samples 1 and 2, which do you think supplies stronger evidence that m ≠ 7 (that the population mean sleep time differs from 7 hours)?  In other words, which sample (1 or 2) would produce a smaller p-value of the appropriate test of significance? Explain.

(b) Between samples 1 and 3, which do you think supplies stronger evidence that m ≠ 7 (that the population mean sleep time differs from 7 hours)?  In other words, which sample (1 or 2) would produce a smaller p-value of the appropriate test of significance? Explain.

(c) For each of these four samples, use Minitab to calculate the p-value for testing that the population mean differs from 7 hours.

(d) With which of the samples do you have enough evidence to reject the null hypothesis at the .05 level and conclude that the mean sleeping time is in fact different than seven hours?

(e) Comment on whether your conjectures in (a) and (b) are confirmed by the test results.

 

4) One of the questions in the 2001-2002 National Health and Nutrition Examination Surveys (NHANES) study asked subjects about their smoking habits.  One of the questions was whether the person has smoked at least 100 cigarettes in his/her life.  The 2328 people who answered “yes” were asked to report the age at which they started smoking.  The responses are in SmokingStart.mtw.  Suppose we want to test whether the mean age at which smokers begin to smoke differs from 18 years.

(a) Product and describe a dotplot or histogram of these data.  In particular, do they appear to follow a normal distribution?

(b) One way to visually assess whether a normal model can be reasonably applied to a sample of data is through a probability plot.  Choose Graph > Probability Plot, leave it selected to “Single” and click OK, enter C1 in the Graph variables box and click OK.  If the data behave like a normal distribution, this will produce a straight line.  It can be visually easier to assess the fit of a straight line rather than of a curve.  Do the red dots follow a linear pattern?

(c) Are the technical conditions met for a one-sample t-test for these data?

(d) Carry out a one-sample t-test by stating the hypotheses in symbols and in words and calculate the test statistic and p-value.  Include a well-labeled sketch of the sampling distribution for the test statistic, and indicate the area represented by the p-value.  Also indicate whether the sample mean differs significantly from 18 at the .10 level.

(e) Summarize what you learned in this study. Your summary should touch on describing the sample data, whether the technical conditions are met, and if so, the conclusion you would draw, in English, from this inference procedure.

 

5) To consider whether there was evidence of sex discrimination in the starting salaries offered to men and women, the beginning salaries for all 32 male and all 61 female skilled, entry-level clerical employees hired by the Harris Trust and Savings Bank between 1969 and 1977 were obtained (BankSalary.mtw).

(a) Produce a graphical summary to compare the two salary distributions and comment on what these reveal.

(b) In this context, what are the type I and type II errors?  Which do you consider more serious?

(c) Can the difference in the average starting salaries for males and females be reasonably attributed to random chance? (Hint: Conduct a test of significance.)

(d) Comment on the technical conditions for the procedure used in (c).

(e) Remove the male with the highest salary and repeat the analysis in (c).  Do your conclusions change?

(f) Sometimes when we have skewed data, the inference procedure can instead be applied to transformed data.  The most useful transformation is the log transformation, particularly applicable to positively skewed data.  Take the natural log of each group (let c3=loge(c1)).  Would a two-sample t-test be appropriate for these transformed data?

(g) Carry out the two-sample t-test on the transformed data.  Do your conclusions about the statistical significance of the difference between the two groups change?

(h) Does this study provide evidence of gender discrimination?  (Hint: Even if you have eliminated random chance as an explanation, does this study establish a cause-and-effect relationship? If not, suggest another explanation for the tendency for higher salaries among the males.)