Stat 301 – Review 2 Solutions

(p. 264) #1, 2, 6, 17, 31, 38

 

1) (a) The stated population of interest is the vehicles on the road in my hometown.  The sample is the vehicles that I observed between 7 and 8am for those mornings.  The parameter is the proportion of all vehicles on this road that are SUV’s.  The statistic is the proportion of the vehicles that I observe that are SUV’s.

(b) The vehicles that I observe between 7-8am may not be representative of all vehicles on this road.  For example, I may overrepresent those with full-time jobs and underrepresent the younger/older drivers and the more recreational vehicles.

 

2) (a) The sampling frame is the list of cars sold by that dealer.

(b) The recently purchased vehicles will probably not represent the vehicles on the road in my town.  For example, there has been a backlash against SUVs recently and there may be fewer that have been purchased in the last year, yet many people still own them from several years ago.

 

6) (a) This is a parameter, because it pertains to the population of all players who have played in the major leagues.  We would use the symbol m to represent this value.

(b) This is likely to be an overestimate.  To be selected for the Hall of Fame, a player must have had a long and distinguished career.  This sample would ignore the many mediocre and poor players who have had short careers.

(c) Current players have not yet completed their careers, so taking the mean of their playing years will include many smaller (than actual for that player) numbers that will likely produce an underestimate.

 

17) (a) A Type I error would indicate that we decided children were more likely to choose one option over the other when in reality they are not.

(b) A Type II error would indicate that we decided children did prefer one option over the other when in reality they did not.

(c) We have n = 284, a = .05.

Note, you could do the following calculations using either the binomial distribution or the normal distribution (e.g., like here, the sample size is large enough for the normal approximation to the binomial).

 

This indicates that in order to reject the null hypothesis of no preference at the 5% level, we would need a sample proportion of .442 and less or .558 and higher.

 

If p is actually equal to .6, then power  = P( < .442 or  > .558) where  follows an approximate normal distribution with mean .6 and standard deviation » .0291.

The above Minitab probability distribution plot shades the probability of a type II error (not rejecting the null hypothesis that p = .5), so power = 1-.0745= .9255.

 

(d) If the level of significance is smaller, then the power will decrease.

(e) If the sample size is larger, then the power will increase.

(f) The rejection region would stay the same, but now only 34% of samples will fall in this reject region.

power = 1-.607 = .392

 

(g) The power is smaller.  This makes sense, because we are less likely to detect a difference of .05 than a difference of .10.

 

31) (a) The population consists of all physicians and head nurses in Israel at the time of the study.  The sample consists of the 89 physicians and head nurses at the two Israeli hospitals used in the study.  This is not a random sample, so we should proceed with caution.

(b) The parameter is the proportion of all Israeli physicians and head nurses who have administered a placebo to patients.  The statistic is the proportion in this sample who have administered a placebo to patients; the value of this statistic is  = 53/89 = .596.

(c) We have a sample size of 89, so we need the population size to be at least 20(89) = 1780 in order for the binomial approximation to the hypergeometric distribution to be reasonably valid.  Note that the sample size is not huge, so we may prefer to carry out these calculations using the binomial distribution instead of the normal distribution though we n  = 89(.596) = 53 and n(1-) = 89(1-.596)=36 both exceed 10, so we could reasonably use the normal calculations as well.  When in doubt, use the binomial results.

(d) To find the “binomial confidence interval,” just don’t check the box next to “use normal distribution” under the Option button in Stat > Basic Statistics > 1 Proportion.

Test and CI for One Proportion

                                              

Sample   X   N  Sample p         95% CI       

1       53  89  0.595506  (0.486178, 0.698292)

We are 95% confident that between approximately 48.6% and 69.8% of physicians and head nurses in Israel have administered a placebo to patients.

 

Using the normal approximation.

Sample   X   N  Sample p         95% CI

1       53  89  0.595506  (0.493540, 0.697471)

 

(e) A 90% CI for the parameter turns out to be (.503, .683).  This interval is a bit narrower than the 95% CI.

Test and CI for One Proportion

                                                 

Sample   X   N  Sample p         90% CI       

1       53  89  0.595506  (0.502869, 0.683247)

 

(f) No.  There’s no reason to believe that this sample of Israeli physicians and head nurses are representative of American physicians and head nurses on this issue of prescribing placebos to patients.

 

38) (a) Let p represent the proportion of all patients given a placebo who would report a reduction in back pain.  The hypotheses are H0: p = 1/3 vs. Ha: p ≠ 1/3. 

Since we only have 16 patients and only 2 “successes,” the normal approximation to the binomial is not appropriate here.  So we should use the binomial distribution to calculate the p-value and we won’t report a “test statistic” (z-value) – the binomial distribution below is not especially symmetric so z-scores are less meaningful.

Minitab reports the exact binomial p-value to be .075, as seen in the following output.  This is somewhat small, but not terribly small, so the data provide some evidence, but not strong evidence, that the proportion who would experience back pain reduction from a placebo differs from one-third. 

With the 1 Proportion menu here, Minitab calculates the p-value as the probability of getting 2 or fewer, or 10 or more, successes in a random sample of 16 trials with an underlying success probability of 1/3.

 

Test and CI for One Proportion

Test of p = 0.333333 vs p not = 0.333333

                                                 Exact

Sample  X   N  Sample p         95% CI         P-Value

1       2  16  0.125000  (0.015514, 0.383476)    0.075

 

Just to show you the corresponding graph, the grey area is the p-value of interest.

 

(b) For testing H0: p = .1 vs. Ha: p ≠ .1, the p-value turns out to be 1.0.  This is as large as a p-value can be, so the sample data provide no evidence at all to doubt that one-tenth of all back pain patients who could take a placebo would experience pain reduction. 

Test and CI for One Proportion

Test of p = 0.1 vs p not = 0.1

                                                 Exact

Sample  X   N  Sample p         95% CI         P-Value

1       2  16  0.125000  (0.015514, 0.383476)    1.000

(c) The Minitab output in (a) reveals a 95% confidence interval for p, the proportion of all patients given a placebo who would report a reduction in back pain, to be from .016 to .383.

(d) Both of these values are within the confidence interval.  This makes sense because neither value was rejected at the .05 level.

(e) Both p-values would be smaller, and the confidence interval would be narrower.

 

(p. 387) #34, 53, 64

 

34) (a) The observational units are adult Americans, and the variable is whether the person favors or opposes abolishing the penny.

(b) The statistic is .59 (the proportion in the sample of 2136 adult Americans who oppose abolishing the penny).  The parameter can be defined as the proportion of the population of all adult Americans who oppose abolishing the penny.

(c) n   = (2136)(.59) = 1260  10 and n(1-  ) = 2136(.41) = 876  10 and this was a random sample, so the technical conditions for the Wald procedure are met.

(d) Wald interval: .59 + 1.96= .59 ± .012

We are 95% confident that between 57.8% and 60.2% of all adult Americans oppose abolishing the penny.

(e) Since the interval lies entirely above .5, it provides convincing evidence that more than half of all American adults oppose abolishing the penny.

(f) With the smaller sample size, the width of the interval would be larger.

(g) The midpoint would remain at   = .59.

(h) No, we are 95% confident that p is in this interval, meaning that if we repeated this procedure many, many times, about 95% of constructed intervals would succeed in capturing p.  We are not making statements about future sample proportions lying in this interval.

 

53) Sample results: n=2328 smokers  = 18.197 years, s=5.388 years

Let m represent the mean age at which all smokers begin to smoke

H0: m = 18

Ha: m ≠ 18 (the mean age differs from 18)

Since the sample size n is huge and the sample was selected at random, the technical conditions are met.

If m = 18, we would observe a sample mean at least as extreme as 18.197 years in about 7.8% of random samples from this population.

With a p-value of .0778, we would reject H0 at the .10 level but not the .05 level or the .01 level.  We have some, but not strong, evidence that the mean age at which smokers start smoking in this population is different from 18 years.

(b) With df = 2328, t* for 95% confidence would be 1.96

18.197 ± 1.96(5.388/) = 18.197 ± .219

We are 95% confident that the mean age at which smokers begin smoking is between 17.978 years and 18.416 years.

(c) No, this is an interval for the population mean, not for the ages of individual members of the sample.

(d) Because the sample is skewed to the right, we have strong reason to believe that the population distribution is skewed to the right so it is not appropriate to calculate a prediction interval from this sample.

 

64) (a) Subtracting the peanut butter times from the milk chocolate times, we can analyze the distribution of differences:

Variable  N   Mean  StDev     Q1  Median     Q3   IQR

Diff     20  10.90  26.51  -6.50   16.00  29.50  36.0

The differences have a slight skew to the left, but the normal probability plot suggests that a normal model is reasonable.  The mean difference in melting times between the two kinds of chips is 10.9 seconds, with a standard deviation of 26.51 seconds.  Most of the differences are positive, indicating that for most people the peanut butter chip melted more quickly.  The median difference is 16 seconds, about 5 seconds larger than the mean, because there are a few people with a fairly large negative difference.

(b) The distribution of sample differences is approximately normal, so if we treat this as a random sample from the population, the technical conditions are satisfied. 

Letting md represent the mean of the differences in melting times in the population, the hypotheses are:

H0: md = 0 (neither type of chip melts more quickly on average)

H­a­: md ≠ 0 (one type of chip melts more quickly on average)

The test statistic is: t0 = (10.9-0)/(26.51/) = 1.84. 

The p-value, based on 19 degrees of freedom, is .082.

This p-value is somewhat but not very small, so the sample data provide some but not much evidence to conclude that one type of chip melts more quickly on average than the other.

(c) A 90% confidence interval for md is (.651, 21.149), as seen in the Minitab output:

One-Sample T: diff (milk choc - pb)

Variable   N     Mean    StDev  SE Mean        90% CI

diff      20  10.9000  26.5070   5.9271  (0.6512, 21.1488)

We can be 90% confident that the mean difference in melting times is between .65 and 21.15 seconds.  Because this interval is entirely positive, we have some evidence that the peanut butter chips melt more quickly on average than the milk chocolate chips.

(d) The test statistic would be the negative of what it was before, and the p-value would be unchanged.  The confidence interval would be the negative of what it was before.

 

For each of the following scenarios, indicate whether you would use one-sample z procedures or one-sample t procedures:

(a) What proportion of CP students study 25-35 hours per week?

One-sample z confidence interval for the population proportion

 

(b) Does the average weight gain by CP freshmen exceed 10 lbs?

One sample t test of H0: m = 10 vs. Ha: m > 10

 

(c) Do a majority of CP students plan to vote in the presentation election?

One-sample z test of H0: p = .5 vs. Ha: p > .5

 

(d) Predict the guess of my age by a CP student 

One-sample t prediction interval

 

(e) Does the average guess of my age by all CP students increase by more than 2 years after meeting me?

One-sample t test of H0: m = 2, Ha: m > 2 where m = average increase in age guess after meeting me for all Cal Poly students