Stat 217 – HW 3 Solutions

 

1) (6 pts)

(a) Histogram

Where there is no perfect number of bins, you might want to aim for 10-15 bars.  Also notice here I moved in the horizontal and axis limits to focus more on the distribution.

 

(b) In my display the margins of victory are skewed to the right with two main clusters, one between 0-20 points and the second around 30-50 points.  The margins range from about negative 10 up to around 60 points.

 

(c) I would expect the margins to be much smaller, with many more negative values as the top 25 teams play each other and tougher opponents later in the season.  It would also be reasonable to  expect smaller spread and maybe even a more normal distribuiton instead of having these large margins compared to the other games.

 

2) Activity 2-18 (p. 30)

After six weeks, 45% of those using the nicotine lozenge had successfully quit while only 30% of those using the placebo had quit smoking. So smokers using the lozenge are 1.5 times more likely to quit smoking. However, after 52 weeks (a year later), only about 18% of those using the nicotine lozenge were still not smoking, compared to 10% of those using the placebo. This still makes the lozenge users more likely to quit smoking (1.8 times now)—but the overall chance that a member of either group will successfully refrain from smoking has dropped significantly.

 

Wow, I really meant to assign 2-17

(a) roller coasters

(b) quantitative variables: Height, Length, Speed, Number of inversions

categorical variables: Type of coaster (wooden or steel; binary), Design (sit down, stand up, inverted)

(c) The heights of the steep coasters appear to have both a larger center and much greater variability than the wooden coasters. The steel coasters also seem to have a couple of high outliers at 420 ft.

(d) Example answers: steel—148 ft; wooden 100 ft.

(e) The steel coasters tend to be taller than the wooden. Most of the steel coasters are taller than most of the wooden coasters.

(f) No, one type of coaster is not always taller. There are some very short steel coasters and some relatively long wooden coasters.

 

3) The file Yankees09.xls contains data on a starting lineup for the 2009 Yankees.

(a)-(b) Dotplot with mean and median

(c) UsingDotplot Sumarries output

(d) We need the middle 50% of the values (the middle 5) to be closer together, but then can make the 1st, 2nd, 9th, 10th further out to raise the SD.

(e) The measures of cneter will shift up the sample amount, up by 2 years as well.

(f) Shifting all the data values the same amount will not change our measure of spread. We have moved everything up, we haven’t changed any distances between values.

(g) The IQR will not change since the smallest data value was not involved in the calculation. The standard deviation will be larger since the outlier will increase our measure of typical distances from the mean.

 

4) Activity 7-21 (p. 138)

(a) Three students received a score of 77 on the exam.

(b) Fifteen students scored 90 or greater.  This is 15/62 = .242

(c) Ten students scored less than 70.  This is 10/62 = .161

(d) Ninety is the score that appeared most often.

(e) The two values that no one obtained are 78 and 86.

 

5) Activity 9-7 (p. 176) part (a) only but explain your reasoning

(a) The observations for Lincoln are more spread out, cover a wider range, tend to lie further from the mean temperature.  The observations for Sedona are closely clumped together, cover a small range, tend to fall close to the mean temperature.

Largest: Lincoln

Smallest: Sedona

 

6) Activity 9-25 (p. 182-3)

(a) Data A: The center is probably around 65. The middle chunk of the data seems to lie between about 57 and 75. So something between 5 and 10 seems reasonable.

Data B: The biggest chunk seems to fall between 160 and 250 so something around 40 seems reasonable.

Data C: The middle 2/3 of the data seems to fall between .95 and 1.06, so something around .05 seems reasonable.

Data D: The middle 2/3 seems to fall between -8 and 0, so something around 4 seems reasonable.

 

(b) Data A:  mean = 64.454, standard deviation = 9.60 

      Data B:  mean = 202.52, standard deviation = 51.88

      Data C:  mean = .99947, standard deviation = .05 (a good example of needing more bins!)

Data D:  mean = 5.405, standard deviation = 4.71