**Stat 301 – Review 2 Solutions**

**(p. 264) #1, 2, 6,
17, 31, 38**

1) (a) The stated population of interest is the vehicles on the road in my hometown. The sample is the vehicles that I observed between 7 and 8am for those mornings. The parameter is the proportion of all vehicles on this road that are SUV’s. The statistic is the proportion of the vehicles that I observe that are SUV’s.

(b) The vehicles that I observe between 7-8am may not be representative of all vehicles on this road. For example, I may overrepresent those with full-time jobs and underrepresent the younger/older drivers and the more recreational vehicles.

2) (a) The sampling frame is the list of cars sold by that dealer.

(b) The recently purchased vehicles will probably not represent the vehicles on the road in my town. For example, there has been a backlash against SUVs recently and there may be fewer that have been purchased in the last year, yet many people still own them from several years ago.

6) (a) This is a parameter, because it pertains to the population of all players who have played in the major leagues. We would use the symbol m to represent this value.

(b) This is likely to be an overestimate. To be selected for the Hall of Fame, a player must have had a long and distinguished career. This sample would ignore the many mediocre and poor players who have had short careers.

(c) Current players have not yet completed their careers, so taking the mean of their playing years will include many smaller (than actual for that player) numbers that will likely produce an underestimate.

17) (a) A Type I error would indicate that we decided children were more likely to choose one option over the other when in reality they are not.

(b) A Type II error would indicate that we decided children did prefer one option over the other when in reality they did not.

(c) We have *n* =
284, a = .05.

Note, you could do the following
calculations using either the binomial distribution or the normal distribution
(e.g., like here, the sample size is large enough for the normal approximation
to the binomial).

This indicates that in order to reject the null hypothesis of no preference at the 5% level, we would need a sample proportion of .442 and less or .558 and higher.

If p
is actually equal to .6, then *power * = P( __<__ .442 or __>__ .558) where follows an approximate normal distribution
with mean .6 and standard deviation » .0291.

The above Minitab probability distribution plot shades the probability of a type II error (not rejecting the null hypothesis that p = .5), so power = 1-.0745= .9255.

(d) If the level of significance is smaller, then the power will decrease.

(e) If the sample size is larger, then the power will increase.

(f) The rejection region would stay the same, but now only 34% of samples will fall in this reject region.

power = 1-.607 = .392

(g) The power is smaller. This makes sense, because we are less likely to detect a difference of .05 than a difference of .10.

**31)** (a) The population consists of all
physicians and head nurses in Israel at the time of the study. The sample consists of the 89 physicians and
head nurses at the two Israeli hospitals used in the study. This is not a random sample, so we should
proceed with caution.

(b)
The parameter is the proportion of all Israeli physicians and head nurses who
have administered a placebo to patients.
The statistic is the proportion in this sample who have administered a
placebo to patients; the value of this statistic is _{} = 53/89 = .596.

(c) We have a sample size of 89, so we need the population
size to be at least 20(89) = 1780 in order for the binomial approximation to
the hypergeometric distribution to be reasonably valid. Note that the sample
size is not huge, so we may prefer to carry out these calculations using the
binomial distribution instead of the normal distribution though we *n* = 89(.596) = 53 and *n*(1-) = 89(1-.596)=36 both exceed 10, so we could reasonably use
the normal calculations as well. When in
doubt, use the binomial results.

(d) To
find the “binomial confidence interval,” just don’t check the box next to “use
normal distribution” under the Option button in Stat > Basic Statistics >
1 Proportion.

Test and CI for One Proportion

Sample
X N Sample p 95% CI

1
53 89 0.595506
(0.486178, 0.698292)

We are 95% confident that between
approximately 48.6% and 69.8% of physicians and head nurses in

Using the normal approximation.

Sample X N
Sample p 95% CI

1 53
89 0.595506 (0.493540, 0.697471)

(e) A 90% CI for the parameter turns out to
be (.503, .683). This interval is a bit
narrower than the 95% CI.

Test and CI for One Proportion

Sample
X N Sample p 90% CI

1
53 89 0.595506
(0.502869, 0.683247)

(f) No.
There’s no reason to believe that this sample of Israeli physicians and
head nurses are representative of American physicians and head nurses on this issue
of prescribing placebos to patients.

**38)** (a) Let p represent the proportion of all patients
given a placebo who would report a reduction in back pain. The hypotheses are H_{0}: p = 1/3 vs. H_{a}: p ≠ 1/3.

Since we only have 16 patients and only 2 “successes,” the normal approximation to the binomial is not appropriate here. So we should use the binomial distribution to calculate the p-value and we won’t report a “test statistic” (z-value) – the binomial distribution below is not especially symmetric so z-scores are less meaningful.

Minitab reports the exact binomial *p*-value to be .075, as seen in the following output. This is somewhat small, but not terribly
small, so the data provide some evidence, but not strong evidence, that the
proportion who would experience back pain reduction from a placebo differs from
one-third.

With the 1 Proportion menu here,
Minitab calculates the *p*-value as the
probability of getting 2 or fewer, or 10 or more, successes in a random
sample of 16 trials with an underlying success probability of 1/3.

Test and CI for One Proportion

Test of p = 0.333333 vs p not = 0.333333

Exact

Sample X N
Sample p 95% CI P-Value

1 2 16
0.125000 (0.015514, 0.383476) 0.075

Just to show you the corresponding graph, the grey area is
the p-value of interest.

(b) For testing H_{0}: p = .1 vs. H_{a}: p
≠ .1, the *p*-value turns out to
be 1.0. This is as large as a *p*-value can be, so the sample data
provide no evidence at all to doubt that one-tenth of all back pain patients
who could take a placebo would experience pain reduction.

Test and CI for One Proportion

Test of p = 0.1 vs p not = 0.1

Exact

Sample X N
Sample p 95% CI P-Value

1 2 16
0.125000 (0.015514,
0.383476) 1.000

(c) The Minitab output in (a) reveals a 95% confidence interval for p, the proportion of all patients given a placebo who would report a reduction in back pain, to be from .016 to .383.

(d) Both of these values are within the confidence interval. This makes sense because neither value was rejected at the .05 level.

(e) Both *p*-values
would be smaller, and the confidence interval would be narrower.

**(p. 387) #34, 53, 64**

**34)** (a) The
observational units are adult Americans, and the variable is whether the person
favors or opposes abolishing the penny.

(b) The statistic is .59 (the proportion in the sample of 2136 adult Americans who oppose abolishing the penny). The parameter can be defined as the proportion of the population of all adult Americans who oppose abolishing the penny.

(c) *n*_{ }_{ } = (2136)(.59) = 1260 _{} 10 and *n*(1-_{
}_{ })
= 2136(.41) = 876 _{} 10 and this was a
random sample, so the technical conditions for the Wald procedure are met.

(d) Wald interval: .59 __+__ 1.96_{}= .59 ± .012

We are 95% confident that between 57.8% and 60.2% of all adult Americans oppose abolishing the penny.

(e) Since the interval lies entirely above .5, it provides convincing evidence that more than half of all American adults oppose abolishing the penny.

(f) With the smaller sample size, the width of the interval would be larger.

(g) The midpoint would remain at _{ } = .59.

(h) No, we are 95% confident that p is in this interval, meaning that if we repeated this procedure many, many times, about 95% of constructed intervals would succeed in capturing p. We are not making statements about future sample proportions lying in this interval.

**53) **Sample
results: *n*=2328 smokers = 18.197 years, *s=*5.388 years

Let m represent the mean age at which all smokers begin to smoke

H_{0}: m =
18

H_{a}: m ≠
18 (the mean age differs from 18)

Since the sample size *n*
is huge and the sample was selected at random, the technical conditions are
met.

If m = 18, we would observe a sample mean at least as extreme as 18.197 years in about 7.8% of random samples from this population.

With a *p*-value of
.0778, we would reject H_{0} at the .10 level but not the .05 level or
the .01 level. We have some, but not
strong, evidence that the mean age at which smokers start smoking in this
population is different from 18 years.

(b) With df = 2328, *t**
for 95% confidence would be 1.96

18.197 ± 1.96(5.388/) = 18.197 ± .219

We are 95% confident that the mean age at which smokers begin smoking is between 17.978 years and 18.416 years.

(c) No, this is an interval for the population mean, not for the ages of individual members of the sample.

(d) Because the sample is skewed to the right, we have strong reason to believe that the population distribution is skewed to the right so it is not appropriate to calculate a prediction interval from this sample.

**64) **(a)
Subtracting the peanut butter times from the milk chocolate times, we can
analyze the distribution of differences:

Variable
N Mean StDev
Q1 Median Q3
IQR

Diff
20 10.90 26.51
-6.50 16.00 29.50
36.0

The differences have a slight skew to the
left, but the normal probability plot suggests that a normal model is
reasonable. The mean difference in
melting times between the two kinds of chips is 10.9 seconds, with a standard
deviation of 26.51 seconds. Most of the
differences are positive, indicating that for most people the peanut butter
chip melted more quickly. The median
difference is 16 seconds, about 5 seconds larger than the mean, because there
are a few people with a fairly large negative difference.

(b) The distribution of sample differences is approximately normal, so if we treat this as a random sample from the population, the technical conditions are satisfied.

Letting m_{d}
represent the mean of the differences in melting times in the population, the
hypotheses are:

H_{0}: m_{d}
= 0 (neither type of chip melts more quickly on average)

Ha: m_{d} ≠
0 (one type of chip melts more quickly on average)

The test statistic is: *t*_{0}
= (10.9-0)/(26.51/) = 1.84.

The *p*-value, based
on 19 degrees of freedom, is .082.

This *p*-value is
somewhat but not very small, so the sample data provide some but not much
evidence to conclude that one type of chip melts more quickly on average than
the other.

(c) A 90% confidence interval for m_{d} is (.651, 21.149), as seen in the Minitab output:

One-Sample T: diff (milk choc - pb)

Variable
N Mean StDev
SE Mean 90% CI

diff
20 10.9000 26.5070
5.9271 (0.6512, 21.1488)

We can be 90% confident that the mean
difference in melting times is between .65 and 21.15 seconds. Because this interval is entirely positive,
we have some evidence that the peanut butter chips melt more quickly on average
than the milk chocolate chips.

(d) The test statistic would be the negative of what it was
before, and the *p*-value would be
unchanged. The confidence interval would
be the negative of what it was before.

**For each of the
following scenarios, indicate whether you would use one-sample z procedures or one-sample t procedures: **

*(a) What proportion of
CP students study 25-35 hours per week?*

One-sample *z*
confidence interval for the population proportion

*(b) Does the average
weight gain by CP freshmen exceed 10 lbs?*

One sample *t* test
of H_{0}: m = 10 vs. H_{a}:
m > 10

*(c) Do a majority of CP
students plan to vote in the presentation election?*

One-sample *z* test
of H_{0}: p = .5 vs. H_{a}:
p > .5

*(d) Predict the guess
of my age by a CP student *

One-sample *t*
prediction interval

*(e) Does the average
guess of my age by all CP students increase by more than 2 years after meeting
me?*

One-sample *t* test
of H_{0}: m = 2, H_{a}:
m > 2 where m = average increase in
age guess after meeting me for all Cal Poly students