Stat 217 – HW 6 Solutions
1)
Activity 20-14 (p. 406)
(a) The distribution is mound-shaped with a slight skew to the left and
two potential outliers at 48 years and 50 years. In general, students
tended to underestimate this instructor’s age. Only 4 of the 44 students
guessed the correct age, and only another 9 of the 44 overestimated his age.
There were two main peaks of the underestimates, one around 37 years and one at
about 43 years. Most of the guesses tended to fall between 41 and 46 years of
age.
(b) These values are statistics, n,
and s.
(c) The
population would be all students at this school. The parameter would be
,
the mean guess of this instructor’s age for this population.
(d)
1. The parameter,
, is the
mean guess of this instructor’s age for all students at this school.
2. H0:
= 44 (the average guess of his age equals his
age)
Ha:
44 (the population mean age guess differs from
the instructors’ actual age)
3. Since the sample size n = 44
> 30, we will consider the sample size condition met.
These were students in his class, so not really a random sample from the
population, but I don’t see an obvious reason why these students would be
better or worse guessers than others (once they meet the professor).
4.-5. Using Test of Significance Calculator with One Mean:

6. With p-value < .0001, we will reject the null hypothesis at the
= .01 level of significance.
We have convincing evidence that the mean guess of this instructor’s age
differs from his actual age.
Note: We
should probably worry a bit about the randomness condition as this
sample is perhaps not representative of
the guesses of other students at the school, because these students might feel
the need to err on the size of underestimating the instructor’s age (rather
than risk offending their instructor).
2) Continuation of
previous activity
(a) Produce a
95% confidence interval for the population parameter using the information in
20-14.

(b) We are
95% confidence that the mean guess of this instructor’s age by all students at
this school (assuming we have a representative sample) is between 39.967 years
and 42.397 years.
(c) Changing
the confidence level and pressing the button again:

The midpoint
will still be 41.182 (the sample mean) but the width (42.806-39.558 = 3.248
years) is greater than with the 95% confidence interval (42.397-39.967 = 2.43
years).
To compute
the midpoint, average the two endpoints: (39.558+42.806)/2 = 41.182 =
(39.967+42.397)/2
(d) This
would increase the mean by one but not change the standard deviation (since all
the data values shift by the same amount, no change in spread). Therefore, the
midpoint would shift up by one but the width of the interval would not change.

3) Activity 22-14 (p. 455-6) parts (e)-(h) using the data table above (e)
and then add
(i) Calculate and interpret a 95% confidence interval
for the parameter.
(e) The null
hypothesis is the population mean number of hours of television watched is the
same under both the control and intervention conditions after the intervention.
In symbols, H0:
control =
intervention.
The
alternative hypothesis is the population mean number of hours of television
watched is greater under the control condition than under the treatment
condition after the intervention. In symbols, Ha:
control >
intervention.

Since p-value
= .0008 < .05, we will reject the null hypothesis at the .05 level. We have
convincing evidence that the mean number of hours of television viewing per
week is higher in the control group than in the intervention group.
(f) In the
intervention group, since the standard deviation is larger than the mean, this
distribution could not be normal. An
interval that includes values one standard deviation below the mean would
include negative numbers, which make no sense in this context (hours of
television watched), so the distribution must be skewed to the right. Similarly with the control group, since the mean
and standard deviation are so similar, an interval of values 2 standard
deviations above and below the mean would include many negative hours of
television watched. So this distribution
could not be symmetrical either.
(g) The non-normality
of these distributions does not hinder the validity of using this test
procedure because the sample sizes are both well above 30 (so the Central Limit
Theorem will still apply and the sampling distribution of the difference in
sample means will still be approximately normal).
(h) Since the
random assignment should have evened out the groups
before the intervention (and the t-test
from the previous
questions confirms no significant difference between the groups in
terms of hours of television watched), we are safe in concluding that the
significant difference in the mean hours of television watched by the control
and intervention groups was caused by the curriculum intervention. However, as noted in previous activities, we
should be cautious in generalizing these results to all elementary schools in
the San Jose area as the subjects were not randomly selected, but instead
selected from two schools. We definitely
would not extend these results beyond San Jose elementary schools, and we
should keep in mind that the children in both groups self-reported the amount
of television they were watching.
(i) Calculate and interpret a 95% confidence interval
for the parameter.

We are 95%
confident that the mean number of hours of television viewing per week in the
control population is 2.22 to 9.10 hours larger than the mean number of hours
in the intervention group.
(We can say
the intervention decreased the mean by 2.24 to 9.10 hours.)
4) Activity 21-11 (p. 431) but replace part (c) with
(a)
g –
b represents the
difference in the population proportion of girls that have televisions in their
bedrooms and the population proportion of boys that have televisions in their
bedrooms.
(b) Using the
Test of Significance Calculator applet:
![]()


We are 95% confident that the proportion of
all girls who have televisions in their bedrooms is somewhere between .04 and
.12 less than the proportion of all boys who have televisions in their
bedrooms. The fact that all the values
in our confidence interval are negative indicates that the proportion of girls
is strictly less than the proportion of boys.
(c) This
interval would simply change signs from the one we calculated (.04, .12)
indicating that the male proportion is .04 to .12 greater than the female
proportion. (The standard error is exactly the same and the midpoint,
1-
2 just changes sign to .08 instead of -.08.)
5) Activity 21-25 (p. 436)
(a) We need to know
how many girls were interviewed.
(b) Let
be the difference in the population
probability of having a smoking daughter
H0: πmother
smoked = πmother didn’t smoke
Ha: πmother
smoked > πmother didn’t smoke (considering daughters
of women who smoked during pregnancy to be more likely to smoke themselves)
Note if the sample
size is 50 we don’t quite have 5 successes in the non-smoking mothers group
(.04 × 50 = 2) so the two-sample z-procedure is not valid! We also don’t know how the teenagers were
selected for the study. All in all the
technical conditions are suspect here.

The p-value is .001 and
we would reject at the .05 level. We conclude that there is strong statistical
evidence that daughters of mothers who smoke during pregnancy are more likely
to smoke themselves than daughters of mothers who do not smoke during
pregnancy.
c.

With such a small p-value, reject H0 at the
α = .05 significance level. (The
difference is statistically significant.) We conclude that there is very strong
statistical evidence that daughters of mothers who smoke during pregnancy are
more likely to smoke themselves than daughters of mothers who do not smoke
during pregnancy.
d.

With such a small p-value, reject H0 at the
α = .05 significance level. (The
difference is statistically significant.) We conclude that there is extremely
strong statistical evidence that daughters of mothers who smoke during
pregnancy are more likely to smoke themselves than daughters of mothers who do
not smoke during pregnancy.
e.
The appearance of this graph
does not change as the sample size increases because the graph displays the
sample proportions (.26 and .04)
rather than the sample number/counts
of smoking daughters.
f. This
is an observational study because the researchers did not decide who
would/would not smoke. This explanatory
variable was determined by the mothers themselves.
g. Since
this is an observational study and not an experiment, although the results are
statistically significant, we cannot conclude that the pregnant mothers’
smoking causes the daughters’ tendency to smoke. Possible confounding variables include
whether or not their fathers or some other household member smoke during their
childhood.