Stat 322 – HW 2

Due Friday, Jan. 19

 

Remember to include any relevant computer output and to always show your work.  Data files can be downloaded from the course webpage (e.g., http://statweb.calpoly.edu/bchance/stat322/data/)

 

1) Many medical trials randomly assign patients to either an active treatment or a placebo. These trials are always double-blind.  Sometimes the patients can tell whether or not they are getting the active treatment which defeats the purpose of blinding!  Reports of medical research usually ignore this problem.  Investigators looked at a random sample of 98 articles reporting on placebo-controlled randomized trials in the top five general medical journals.  Only 7 of the 97 discussed the successes of blinding.  Estimate, with 95% confidence, the proportion of all such studies that discuss the success of blinding.  Make sure you first define the population parameter of interest in words.  Also discuss the technical conditions of the procedure. In particular, if the Wald procedure is not valid, use the adjusted Wald procedure instead.

 

2) Nenana is a small, interior Alaskan town that holds a famous competition to predict the exact moment that “spring arrives” every year.  The arrival of spring is defined to be the moment when the ice of Tanana River breaks, which is measured by a tripod erected on the ice with a trigger to an official clock.  The minute at which the ice breaks has been recorded in every year since 1917.    The Minitab worksheet NenanaIceBreak.mtw contains all the data since 1917. 

(a) Examine and comment on numerical summaries (e.g., Stat > Basic Statistics > Display Descriptive Statistics) and graphical displays (e.g., Graph > Histogram or Graph > Dotplot) of the “date” variable (C7), recorded in days, with April 1 being coded as 1.   Remember to comment on shape, center, and spread of the distribution, and to relate your comments to the context.

(b) Treat these data as a random sample from the process by which nature produces the ice-breaking dates each year.  Produce a 95% confidence interval for the population mean date.  Then translate the endpoints from the coded scale to the actual calendar, and interpret the interval.

(c) Provide an interpretation of “95% confidence” in this context.

(d) Produce a 95% prediction interval for the ice breakup date this year.  Again translate the endpoints from the coded scale to the actual calendar, and interpret the interval.

(e) Provide an interpretation of “95% confidence” in this context.

(f) Comment on the technical conditions needed for each of these procedures and whether you believe they are reasonably met (and how you are deciding).

(g) Which interval do you think is of more interest here?  Explain.

 

3) exercise 55 (p. 312)

 

4) The 2002 National Health Interview Survey (NHIS) took a representative sample of 31,044 American adults using a multistage cluster design (An alternative to a simple random sample where regions are selected at random and then subregions are selected from within these regions etc., until finally groups, like city blocks, are selected at random.  Such designs can make it easier to track down the individuals selected for the sample.).  One of the findings was that 22.5% of the individuals sampling identified themselves as current smokers.  The report listed the standard error of this statistic as .0032.

(a) If the sampling design had been a simple random sample, what would the standard error had been?

(b) Is the reported standard error from the multistage cluster design larger or smaller than the standard error from a simple random sample of the same size?  Conjecture as to why this is true.

(c) Identify in words the parameter of interest in this study.

(d) Use the reported standard error (.0032) to produce a 99% confidence interval for this parameter.

(e) Does this interval provide evidence that fewer than 25% of American adults were smokers in 2002?  Explain your reasoning.

(f) What is the largest confidence level for which you would conclude that fewer than 25% of American adults were smokers in 2002?

(g) The report also mentioned that 25.2% of the males interviewed and 20.0% of the females interviewed identified themselves as current smokers.  Do you expect the standard errors of these statistics to be less than, greater than, or equal to .0032?  Explain.

 

5) Assuming all else stays the same, what can you say about the relative ordering of the margins-of-error for the following intervals?

            95% confidence interval for m

            95% prediction interval

            90% confidence interval for m

            90% prediction interval

 

6) problem 2 (p. 324)