Stat 217 – HW 3
Solutions
1)
(6 pts)
(a) Histogram

Where there is no perfect number of bins,
you might want to aim for 10-15 bars.
Also notice here I moved in the horizontal and axis limits to focus more
on the distribution.
(b) In my display the margins of
victory are skewed to the right with two main clusters, one between 0-20 points
and the second around 30-50 points. The
margins range from about negative 10 up to around 60 points.
(c) I would expect the margins to be
much smaller, with many more negative values as the top 25 teams play each
other and tougher opponents later in the season. It would also be reasonable to expect smaller spread and maybe even a more
normal distribuiton instead of having these large margins compared to the other
games.
2) Activity 2-18 (p. 30)
After
six weeks, 45% of those using the nicotine lozenge had successfully quit while
only 30% of those using the placebo had quit smoking. So smokers using the
lozenge are 1.5 times more likely to quit smoking. However, after 52 weeks (a
year later), only about 18% of those using the nicotine lozenge were still not
smoking, compared to 10% of those using the placebo. This still makes the
lozenge users more likely to quit smoking (1.8 times now)—but the overall
chance that a member of either group will successfully refrain from smoking has
dropped significantly.
Wow, I really
meant to assign 2-17
(a) roller
coasters
(b) quantitative
variables: Height, Length, Speed, Number of inversions
categorical
variables: Type of coaster (wooden or steel; binary), Design (sit down, stand
up, inverted)
(c) The heights of
the steep coasters appear to have both a larger center and much greater
variability than the wooden coasters. The steel coasters also seem to have a
couple of high outliers at 420 ft.
(d) Example answers: steel—148
ft; wooden 100 ft.
(e) The steel
coasters tend to be taller than the wooden. Most of the steel coasters are taller
than most of the wooden coasters.
(f) No, one type of
coaster is not always taller. There are some very short steel coasters and some
relatively long wooden coasters.
3) The file Yankees09.xls
contains data on a starting lineup for the 2009 Yankees.
(a)-(b) Dotplot
with mean and median

(c)
UsingDotplot Sumarries output

(d) We need
the middle 50% of the values (the middle 5) to be closer together, but then can
make the 1st, 2nd, 9th, 10th
further out to raise the SD.

(e) The
measures of cneter will shift up the sample amount, up by 2 years as well.

(f) Shifting
all the data values the same amount will not change our measure of spread. We have moved everything up, we
haven’t changed any distances between values.
(g) The IQR
will not change since the smallest data value was not involved in the
calculation. The standard deviation will be larger since the outlier will
increase our measure of typical distances from the mean.
4) Activity 7-21 (p. 138)
(a) Three
students received a score of 77 on the exam.
(b) Fifteen
students scored 90 or greater. This is
15/62 = .242
(c) Ten
students scored less than 70. This is
10/62 = .161
(d) Ninety
is the score that appeared most often.
(e) The
two values that no one obtained are 78 and 86.
5) Activity 9-7 (p. 176) part (a) only but explain your reasoning
(a) The
observations for Lincoln are more spread out, cover a wider range, tend to lie
further from the mean temperature. The
observations for Sedona are closely clumped together, cover a small range, tend
to fall close to the mean temperature.
Largest:
Lincoln
Smallest:
Sedona
6) Activity 9-25 (p. 182-3)
(a) Data A: The
center is probably around 65. The middle chunk of the data seems to lie between
about 57 and 75. So something between 5 and 10 seems reasonable.
Data B: The biggest
chunk seems to fall between 160 and 250 so something around 40 seems
reasonable.
Data C: The middle
2/3 of the data seems to fall between .95 and 1.06, so something around .05
seems reasonable.
Data D: The middle
2/3 seems to fall between -8 and 0, so something around 4 seems reasonable.
(b) Data
A: mean = 64.454, standard deviation =
9.60
Data B:
mean = 202.52, standard deviation = 51.88
Data C:
mean = .99947, standard deviation = .05 (a good example of needing more
bins!)
Data D: mean = 5.405, standard deviation = 4.71