Stat 217 – HW 6 Solutions

 

1) Activity 20-14 (p. 406)

(a) The distribution is mound-shaped with a slight skew to the left and two potential outliers at 48 years and 50 years. In general, students tended to underestimate this instructor’s age. Only 4 of the 44 students guessed the correct age, and only another 9 of the 44 overestimated his age. There were two main peaks of the underestimates, one around 37 years and one at about 43 years. Most of the guesses tended to fall between 41 and 46 years of age.

(b) These values are statistics, n,  and s.

(c) The population would be all students at this school. The parameter would be , the mean guess of this instructor’s age for this population.

(d)

1. The parameter, , is the mean guess of this instructor’s age for all students at this school.

2. H0:  = 44 (the average guess of his age equals his age)

    a:   44 (the population mean age guess differs from the instructors’ actual age)

3. Since the sample size n = 44 > 30, we will consider the sample size condition met.

These were students in his class, so not really a random sample from the population, but I don’t see an obvious reason why these students would be better or worse guessers than others (once they meet the professor).

4.-5. Using Test of Significance Calculator with One Mean:



6. With p-value < .0001, we will reject the null hypothesis at the  = .01 level of significance.

We have convincing evidence that the mean guess of this instructor’s age differs from his actual age.

Note: We should probably worry a bit about the randomness condition as this sample  is perhaps not representative of the guesses of other students at the school, because these students might feel the need to err on the size of underestimating the instructor’s age (rather than risk offending their instructor).

 

2) Continuation of previous activity

(a) Produce a 95% confidence interval for the population parameter using the information in 20-14.

(b) We are 95% confidence that the mean guess of this instructor’s age by all students at this school (assuming we have a representative sample) is between 39.967 years and 42.397 years.

(c) Changing the confidence level and pressing the button again:

The midpoint will still be 41.182 (the sample mean) but the width (42.806-39.558 = 3.248 years) is greater than with the 95% confidence interval (42.397-39.967 = 2.43 years).

To compute the midpoint, average the two endpoints: (39.558+42.806)/2 = 41.182 = (39.967+42.397)/2

 

(d) This would increase the mean by one but not change the standard deviation (since all the data values shift by the same amount, no change in spread). Therefore, the midpoint would shift up by one but the width of the interval would not change.

 

3) Activity 22-14 (p. 455-6) parts (e)-(h) using the data table above (e) and then add

(i) Calculate and interpret a 95% confidence interval for the parameter.

 

(e) The null hypothesis is the population mean number of hours of television watched is the same under both the control and intervention conditions after the intervention. In symbols, H0:  control =  intervention.

The alternative hypothesis is the population mean number of hours of television watched is greater under the control condition than under the treatment condition after the intervention. In symbols, Ha:  control >  intervention.

 

 

Since p-value = .0008 < .05, we will reject the null hypothesis at the .05 level. We have convincing evidence that the mean number of hours of television viewing per week is higher in the control group than in the intervention group.

 

(f) In the intervention group, since the standard deviation is larger than the mean, this distribution could not be normal.  An interval that includes values one standard deviation below the mean would include negative numbers, which make no sense in this context (hours of television watched), so the distribution must be skewed to the right.  Similarly with the control group, since the mean and standard deviation are so similar, an interval of values 2 standard deviations above and below the mean would include many negative hours of television watched.  So this distribution could not be symmetrical either.

(g) The non-normality of these distributions does not hinder the validity of using this test procedure because the sample sizes are both well above 30 (so the Central Limit Theorem will still apply and the sampling distribution of the difference in sample means will still be approximately normal).

(h) Since the random assignment should have evened out the groups before the intervention (and the t-test from the previous questions confirms no significant difference between the groups in terms of hours of television watched), we are safe in concluding that the significant difference in the mean hours of television watched by the control and intervention groups was caused by the curriculum intervention.  However, as noted in previous activities, we should be cautious in generalizing these results to all elementary schools in the San Jose area as the subjects were not randomly selected, but instead selected from two schools.  We definitely would not extend these results beyond San Jose elementary schools, and we should keep in mind that the children in both groups self-reported the amount of television they were watching.

(i) Calculate and interpret a 95% confidence interval for the parameter.

We are 95% confident that the mean number of hours of television viewing per week in the control population is 2.22 to 9.10 hours larger than the mean number of hours in the intervention group.

(We can say the intervention decreased the mean by 2.24 to 9.10 hours.)

 

4) Activity 21-11 (p. 431) but replace part (c) with

(a)  g b represents the difference in the population proportion of girls that have televisions in their bedrooms and the population proportion of boys that have televisions in their bedrooms.

(b) Using the Test of Significance Calculator applet:

We are 95% confident that the proportion of all girls who have televisions in their bedrooms is somewhere between .04 and .12 less than the proportion of all boys who have televisions in their bedrooms.  The fact that all the values in our confidence interval are negative indicates that the proportion of girls is strictly less than the proportion of boys.

(c) This interval would simply change signs from the one we calculated (.04, .12) indicating that the male proportion is .04 to .12 greater than the female proportion. (The standard error is exactly the same and the midpoint, 1-2 just changes sign to .08 instead of -.08.)

 

5) Activity 21-25 (p. 436)

(a) We need to know how many girls were interviewed.

(b) Let  be the difference in the population probability of having a smoking daughter

H0: πmother smoked = πmother didn’t smoke                                

Ha: πmother smoked > πmother didn’t smoke (considering daughters of women who smoked during pregnancy to be more likely to smoke themselves)

Note if the sample size is 50 we don’t quite have 5 successes in the non-smoking mothers group (.04 × 50 = 2) so the two-sample z-procedure is not valid!  We also don’t know how the teenagers were selected for the study.  All in all the technical conditions are suspect here.

The p-value is .001 and we would reject at the .05 level. We conclude that there is strong statistical evidence that daughters of mothers who smoke during pregnancy are more likely to smoke themselves than daughters of mothers who do not smoke during pregnancy.

 

c.

With such a small p-value, reject H0 at the α = .05 significance level.  (The difference is statistically significant.) We conclude that there is very strong statistical evidence that daughters of mothers who smoke during pregnancy are more likely to smoke themselves than daughters of mothers who do not smoke during pregnancy.

 

 

d.

With such a small p-value, reject H0 at the α = .05 significance level.  (The difference is statistically significant.) We conclude that there is extremely strong statistical evidence that daughters of mothers who smoke during pregnancy are more likely to smoke themselves than daughters of mothers who do not smoke during pregnancy.

e.

  

The appearance of this graph does not change as the sample size increases because the graph displays the sample proportions (.26 and .04) rather than the sample number/counts of smoking daughters.

f.    This is an observational study because the researchers did not decide who would/would not smoke.  This explanatory variable was determined by the mothers themselves.

g.   Since this is an observational study and not an experiment, although the results are statistically significant, we cannot conclude that the pregnant mothers’ smoking causes the daughters’ tendency to smoke.  Possible confounding variables include whether or not their fathers or some other household member smoke during their childhood.