Stat 322 – Review Exercise
Scenario 1: Dansinger, Griffith, Gleason, et al. (2005) report on a randomized,
comparative experiment in which 160 subjects were randomly assigned to one of
four population diet plans: Atkins, Ornish, Weight Watchers, and Zone (40
subjects per diet). These subjects were recruited through newspaper and
television advertisements in the greater
(a) Do the average weight losses after 12 months differ significantly
across the four diet plans?
The explanatory variable is diet plan, which is categorical. The response variable is amount of weight loss after 12 months, which is quantitative. Boxplots and numerical summaries follow:

diet N
Mean StDev Median
IQR
Atkins 21
3.92 6.05 3.90
9.15 (all in kilograms)
Ornish 20
6.56 9.29 5.45
6.81 (all in kilograms)
Wgt Watch 26
4.59 5.39 3.60
6.85 (all in kilograms)
Zone 26
4.88 6.92 3.40
8.63 (all in kilograms)
These boxplots and statistics seem to indicate that the four diets do not differ substantially with regard to weight loss after twelve months. The mean and median weight loss are both positive for all four diets, indicating that subjects did tend to lose some weight on these diets, roughly 4-6 kilograms on average. The boxplots also show substantial overlap between the four distributions. The means and medians are very similar for three of the diets, with the Ornish diet having a somewhat larger mean and median weight loss (6.56 and 5.45 kilograms, respectively) than the others. All four distributions of weight loss appear to be fairly symmetric, perhaps a bit skewed to the right. The variability in weight losses is also similar across all four diet plans, with the Ornish diet having the most variability, largely due to its one small and three large outliers.
Because we have a categorical explanatory variable and a quantitative response variable, we will apply ANOVA to these data. The technical conditions appear to be met: the subjects were randomly assigned to diet plans, the distributions look fairly normal (see the following normal probability plots), and the standard deviations are similar (ratio of largest to smallest is 9.29/5.39, which is less than 2).

The hypotheses are:
H0: mA = mO = mW = mZ, where mi represents the underlying treatment mean weight loss after 12 months with diet i. This hypothesis says that the treatment mean is the same for all four diets.
Ha: that at least two of the treatment means differ; in other words, that at least one diet does have a different-sized treatment mean than the others
Minitab produces the following ANOVA table:
Source DF
SS MS F
P
diet 3
77.6 25.9 0.54
0.659
Error 89
4293.7 48.2
Total 92
4371.3
The small F-statistic (F = 0.54) and large p-value (.659) reveals that the experimental data provide essentially no evidence against the null hypothesis. The p-value reveals that differences among the group means at least as big as those found in this experiment would occur about 66% of the time by randomization alone even if there were no true differences among the diets. In other words, the treatment means do not differ significantly, and there is no evidence that these four diets produce different average amounts of weight loss.
(b) Is there a significant difference in the completion/dropout rates
across the four diet plans?
The explanatory variable is diet plan, which is categorical. The response variable is whether the subject completed the study or dropped out, which is also categorical.
The two-way table of completion/dropout status by diet plan, followed by completion proportions and a segmented bar graph:
|
|
Atkins |
Ornish |
Weight Watchers |
Zone |
|
Completed |
21 |
20 |
26 |
26 |
|
Dropped out |
19 |
20 |
14 |
14 |
|
Completion proportion |
.525 |
.500 |
.650 |
.650 |

This preliminary analysis appears to reveal that the completion rates are very similar across the four diet plans. Weight Watchers and Zone tied for the highest completion rate (62.5%), with Ornish having the lowest completion rate (50%), but these do not seem to differ substantially.
To test whether these differences in the distributions of the categorical response variable are statistically significant, we can apply a chi-square test of the hypotheses:
H0: pA = pO = pW = pZ, where pi represents the underlying completion rate after 12 months with diet i (diet does not have an effect on completion rate)
Ha: at least two of the underlying completion rates differ (there is a difference in underlying completion rates across the 4 diets)
Minitab produces the following output:
Weight
Atkins
Ornish Watchers Zone
Total
1
21 20 26
26 93
23.25
23.25 23.25 23.25
0.218
0.454 0.325 0.325
2 19 20
14 14 67
16.75
16.75 16.75 16.75
0.302
0.631 0.451 0.451
Total 40
40 40 40
160
Chi-Sq = 3.158, DF
= 3, P-Value = 0.368
Checking the technical conditions of the chi-square procedure, we note that the subjects were randomly assigned to a diet plan group and that all expected counts in the table are larger than 5 (smallest is 16.75), so we are justified in applying the chi-square test. The p-value of .368 says that if there were no difference in the underlying completion rates (i.e., no treatment effect) of completion among the four diet plans, then it would not be surprising (probability .368) to obtain experimental completion proportions that differ as much as these do. Because this p-value is not small, we can only conclude that the experimental data do not provide evidence to suggest that the completion proportions differ across these four diet plans.
(c) Is there a significant positive association between a subject’s
adherence level and his or her amount of weight loss? Do we need to control for
which diet plan the subject was on?
The explanatory variable is adherence level. This variable is quantitative. (It could be considered categorical if it were simply on a 1-10 scale, but it is the average of 12 such values and so should be treated as quantitative.) The response variable is weight loss, which is quantitative.
A scatterplot of weight loss vs. adherence level follows:

This graph reveals a moderately strong, positive, linear relationship between weight loss and adherence level. The correlation coefficient can be found to be r=.518. The scatterplot and correlation coefficient both suggest that there is a positive association between these variables, that subjects with higher adherence levels tend to lose more weight.
We can fit a regression line for predicting weight loss from
adherence level:

This model indicates that for each additional step on the adherence level scale, the subject is predicted to lose an additional 2.4 kilograms of weight.
While there is a moderately strong positive linear relationship between weight loss and adherence level, we cannot draw a casual link between these two variables. Even though the study was a randomized comparative experiment, the variable imposed by the researchers was the diet plan, not the adherence level. Therefore, for the purpose of relating adherence level and weight loss, this study is essentially observational. However, we still might be interested in investigating whether the relationship observed in this sample is strong enough to convince us that it did not arise by chance. However, these subjects also were not a random sample from a larger population. (They volunteered for this study in response to advertisements.) Still, we might cautiously consider them representative of overweight men and women from the Northeast who would consider enrolling on these diets. With this consideration, we can proceed with a test to determine whether the observed level of association is higher than would be expected by random variation alone.
H0: b1 = 0, where b1 represents the population slope coefficient. This null hypothesis indicates that there is no linear relationship between adherence level and amount of weight loss.
Ha: b1 > 0 There is a positive linear relationship in the population.
Minitab produces the following output:
Predictor Coef
SE Coef T P
Constant -8.432 2.306
-3.66 0.000
adherence
level 2.3876 0.3971
6.01 0.000
Checking the other technical conditions for the regression model, we find the following residual plots:


The plot of residuals vs. the explanatory variable does not
reveal any serious problems, although there is a suggestion of increasing
variability with larger values. The
normal probability plots suggests a bit of a skew to the right but not too
much. This last condition is a bit less problematic
with the large sample size in this study.
We could consider a transformation to account for the increasing
variability, but the increase does not seem substantial enough to warrant a transformation
here. These technical conditions seem to
be fairly well met.
The test statistic is very large (t=6.01) and the p-value very small (.000 to three decimal places, especially after dividing by 2 for the one-sided p-value), and so we can conclude that the experimental data provide extremely strong evidence that there is an association between adherence level and weight loss. The p-value reveals that it would be almost impossible to obtain such a large sample slope coefficient if from random sampling variation alone. At any reasonable significance level, we conclude that this association is statistically significant.
We can follow up with a 95% confidence interval for the population slope b1. We obtain
2.388 + 1.986(0.397), which is 2.388 + 0.788, or (1.600, 3.176). This interval suggests that the additional predicted weight loss for each additional step of adherence to diet is between 1.6 and 3.2 kilograms.
But remember the caveat that we mentioned earlier: the subjects’ adherence levels were observed and not imposed, so we can not draw a cause-and-effect conclusion between adherence level and weight loss. Furthermore, we must be cautious in stating to what population we are willing to generalize these conclusions.
(d) Is there strong evidence that dieters actually tend to lose weight
on one of these population diet plans?
To address the question of whether dieters who complete 12 months on one of these popular diet plans actually tend to lose weight, we can begin by combining the weight loss diet across all four diet plans. This step seems reasonable because of our conclusion in (a) that the there is no evidence of an effect of diet plan on weight loss. While the adherence level does appear to be related to amount of weight lost, the completion rate did not vary significantly across the diets, providing further justification for pooling across the diets. A histogram of the weight loss amounts (in kilograms) for the 93 subjects who completed the 12-month study follows:

This histogram reveals that the distribution of weight loss amounts is a bit skewed to the right. The mean weight loss is 4.95 kilograms, with a standard deviation of 6.89 kilograms. The median is 3.90 kilograms, and most of the subjects had a positive weight loss; in fact, 71 of 93 (76.3%) did.
To perform statistical inference with these data, we need to again consider the volunteer nature of the sample and that any randomness here is hypothetical. We will proceed to conduct tests and make inferences, which will tell us whether the sample results are extreme enough to be unlikely to occur by random variation alone, but we need to keep in mind that the sample may not be representative of any population.
Even though the distribution is a bit skewed, the large sample size (n=93) allows us to perform a t-test of the hypotheses:
H0: m = 0 (the mean weight loss in the population of dieters who could use one of these popular plans is zero)
Ha: m > 0 (the mean weight loss in the population of dieters who could use one of these popular plans is positive)
The test statistic turns out to be to t0=
, producing a p-value
of essentially zero. This suggests that
the sample data provide overwhelming evidence that the population mean weight
loss exceeds zero; i.e., that dieters on these plans do tend to lose weight on
average. A 95% confidence interval for m turns out to be (3.53, 6.36), so we can be
95% confident that the population mean weight
loss is between 3.53 and 6.36 kilograms.
We can cautiously follow this up with a 95% prediction
interval for the weight loss of an individual dieter:
, which is 4.95 + 13.76, which is (-8.81, 18.71). This interval implies that, with 95%
confidence, we can only assert that an individual dieter is predicted to see a
weight change anywhere between a gain of 8.8 kilograms and a loss of 18.7 kilograms. However, the slight skewness in the sample
data leads us to question the validity of this prediction interval since the
normality condition is essential for this procedure.
We could also perform a test of the hypotheses:
H0: p = .5 (half of the population of all potential dieters would lose positive weight on one of these diet plans)
Ha: p > .5 (more than half of the population of all potential dieters would lose positive weight on one of these diet plans)
The data reveal that 71 of 93 subjects had positive weight loss. The binomial distribution (with parameters n=93 and p = .5) reveals that the p-value of P(X>71) equals essentially zero. Thus, this test leads to a similar conclusion: overwhelming evidence that more than half of the population would lose positive weight.
Because of the volunteer nature of the sample, it is not completely clear to what population we can generalize these results. Moreover, even though we concluded that the mean weight loss is significantly larger than zero, we can not attribute the cause to the diet. Without the use of a comparison group of people who did not participate in a diet plan, we cannot conclude that the diet alone is responsible for the tendency to lose weight, a lesson that you first encountered in Investigation 0. Perhaps even the power of suggestion from being in the study was a sufficient cause for these individuals to lose weight on average.
Summarizing our findings from this study:
Scenario 2: Do “better” movies earn more money at the box
office?

We see a weak (r=.424), positive,
linear relationship between the box office revenue and the critics’ composite
scores. We do have several outliers
(Pirates of the

This indicates that only 8.9% of
the variability in box office revenues is explained by this regression critics’
scores. Even so, this relationship is
statistically significant.
H0:
b = 0 (there is no relationship between revenue and score for
all movies that earn under $200 million)
Ha:
b > 0 (there is a positive relationship, movies with
higher critics’ scores tend to earn more money)

With a one-sided
p-value (.000/2) of approximately zero, we do not believe the observed
association is only due to “random chance.”
As long as the movies in this year are representative of “all movies
(earning less than $200 million)” we can conclude that there is a weak positive
linear relationship. However, it doesn’t tell us a whole lot (low r2
value).
To see if
running time is a useful predictor even after adjusting for critics score,
using the original data file we can obtain:
The
regression equation is
box
office = - 95.2 + 1.56 score + 0.641 running time
Predictor Coef
SE Coef T P
Constant -95.18
26.84 -3.55 0.001
score 1.5599 0.3470
4.50 0.000
running
time 0.6406 0.2316
2.77 0.006
S
= 55.9131 R-Sq = 22.3% R-Sq(adj) = 21.2%
Running does appear to be a
useful predictor even after adjusting for critics’ score (though again we might
want to investigate the relationship without the extreme observations). One caution, this is not to say the longer
movies cause higher box office revenue since this was an observational
study and not an experiment.
We could
also look at labeled scatterplots using the ratings and genre as categorical
variables in the graph.
Scenario 3: A physical education teacher at a junior high school in Central California wanted to determine whether there is a relationship between seventh-graders’ times to run a mile and how many push-ups they can do under controlled conditions. She collected the data in PEClass.mtw as part of the mandated state physical-fitness testing program. Analyze these data to address the question of whether they provide evidence of a relationship between these two variables and whether this relationship differs for males and females.
Since we have two quantitative variables here (time to run 1 mile and number of push-ups), the appropriate analysis would be correlation/regression. However, we do not know much about how the sample was selected. It appears to be all students at a particular school and we must be very cautious in generalizing the results beyond this particular group of students.
We first want to examine numerical and graphical summaries. When you open PEClass.mtw you will notice that the mile run times have been recorded in “time format.” We want to convert this to numerical values. Choose Data > Change Data Type > Data/Time to Numeric. Specify C3 as the column to be changed and C4 as the storage location for the converted values. The data are now numeric, but in terms of a 24-hour day. To convert these back to minutes, type
MTB> let c5=c4*24
Now you should have the number of minutes (including the fraction of minute) for each student.
Since we aren’t considering either of these as a response variable, it does not matter which variable we denote as the y-variable and which as the x-variable. If we plot mile time vs. push-ups, we see there is a strong negative association.

Students who do more push-ups also tend to run the mile in faster times. However, there is some evidence that the relationship is not linear. There is also an unusual observation, a student who did a larger number of push-ups but was one of the slowest runners. Carrying out the regression and examining residual plots confirms these observations.

The residual vs. explanatory variable graph also reveals some differences in the amount of variation in the residuals at different values of the explanatory variable (indicating a violation of the constant variance condition).
It appears that transforming these data might be helpful. Since both distributions appear skewed to the right (if you looked at histograms of each variable individually), we could considering taking the log of each variable. The log-log scatterplot does appear more well behaved. (We have used log base ten but natural logs would also work.)

We still have some outliers but appear to now have a linear relationship. If we also examine the normality of the residuals:

This condition also seems to be reasonably met for the transformed data. There is slight evidence of skewness to the right in the residuals but coupled with the large sample size we will not be concerned with this minor deviation.
The correlation coefficient for the transformed variables is
-.624, indicating a moderately strong negative linear relationship between
log(time) and log(push-ups). The
least-squares regression equation is computed by Minitab to be
= 1.23 - .213 log(push-ups). The intercept coefficient
here would indicate the predicted log-time for a student who only completes 1
push up (so log(push-ups)=0) to be 1.23.
This corresponds to a time of 101.23 or about 17
minutes. The slope coefficient predicts
the average multiplicative change in the log-mile times for each unit increase
in log-push ups. A unit increase in “log
push-ups” corresponds to the push-ups increasing by a factor of 10. So for each 10 fold increase in the number of
push-ups (e.g., 1 push up to 10 push ups), the mile time decreases on average
by a factor of 10-.213 = .61.
[Note: our prediction for 10 push-ups is 1.017, corresponding to 101.107
= 10.4 minutes, which is .61(17).]
If we test the significance of this association:
Let b represent the true population slope between log(time) and log(push-ups)
H0: b = 0 (there is not association between these two variables)
Ha: b ≠ 0 (there is an association)
We find a test statistic of t = -10.48 and a two-sided p-value of approximately 0.

Note: This p-value is the same as reported by Minitab with the sample correlation coefficient.
With such a small p-value (< .001) we would consider the relationship between log(time) and log(push-ups) to be statistically significant. While we need to have some caution in generalizing these results to other schools, we can eliminate “random chance” as an explanation for the strong log-log relationship observed in this sample.
If we wanted to use this model to carry out predictions, we need to keep the transformed nature of the variables in mind. For example, if a student completed 25 push-ups during that portion of the test, we would predict 1.23 -.213 log10(25) = .932 for the log-time and therefore 10.932 = 8.56 minutes for the mile time. We also need to start being a little cautious in predicting the mile time for such a large number of push ups as we do not have a large amount of data in this region and our estimate will not be as precise.
In summary, there is a strong negative relationship between number of push-ups and time to run a mile for this 7th and 8th graders as we would expect (those students who do more than the average number of push-ups will tend to be the same students who complete a mile faster, in a below average time). This relationship can be considered statistically significant after performing log transformations, however, we have to be cautious in generalizing these results beyond this sample as the students were not randomly selected from a larger population of junior high students. Since this is an observational study, we are not claiming that doing more push-ups will cause the mile time to decrease.
We also saw in an earlier example that there was a statistically significant difference between males and females on these tasks and we might want to consider incorporating that variable into our analysis as well. For example, a labeled scatterplot shows that the males tended to do more push-ups but were not noticeably different than the females in the mile times.

Adding a coded variable for gender (1 = female and 0 = male), we see that the gender differences (assuming parallel lines) are not statistically significant.
The regression equation is
1 Mile Run Walk Time = 12.7
- 0.176 Push-Up + 0.023 Gender
Predictor Coef
SE Coef T P
Constant 12.6992
0.3706 34.26 0.000
Push-Up -0.17617
0.01980 -8.90 0.000
Gender 0.0231
0.2930 0.08 0.937
S = 1.86215 R-Sq = 33.2% R-Sq(adj) = 32.4%
Adding an interaction term is also not statistically significant.
The regression equation is
1 Mile Run Walk Time = 12.3
- 0.149 Push-Up + 0.815 Gender - 0.0595 push*gender
Predictor Coef
SE Coef T P
Constant 12.2716
0.4662 26.32 0.000
Push-Up -0.14855
0.02696 -5.51 0.000
Gender 0.8154 0.6025
1.35 0.178
push*gender -0.05947
0.03956 -1.50 0.135
S = 1.85541 R-Sq = 34.1% R-Sq(adj) = 32.9%
Scenario 4: With the proliferation of the Internet and 24-hour cable news outlets,
it has become much easier for people to hear much more information, much more
quickly. However, this has led to speculation that news organizations attempt
to convey information before it has been properly verified in an effort to feed
our impatience. USA Today reported that newspapers appear to be losing
credibility over time (March, 2004). A nationwide sample of 1002 adults were
interviewed via telephone during May 6-16, 2002 and asked to relate the
believability of different news organizations on a scale of 1-4. When asked
about “The daily newspaper you are most familiar with,” the percentage
distribution of the 1002 responses was:
|
Believe all or
almost all – 4 |
3 |
2 |
Believes almost
nothing – 1 |
Can’t rate |
|
20% |
39% |
25% |
9% |
7% |
A similar study conducted in May 1998 yielded the following results
(981 responses):
|
4 |
3 |
2 |
1 |
Can’t rate |
|
27% |
36% |
24% |
7% |
6% |
First we need to create a two-way
table. I did this by taking the sample
sizes and multiplying by the percentages and then rounding those values to
integers. I also decided to remove the “can’t
rate” responses from the analysis. Since
we are treating “time” (year of survey) as the explanatory variable, I put it
in the columns and the ratings as the rows.
|
|
2002 |
1998 |
|
Believe all or almost all – 4 |
200 |
265 |
|
3 |
391 |
353 |
|
2 |
251 |
235 |
|
Believes almost nothing – 1 |
90 |
69 |
Also looking at a graphical summary
from this two-way table:

We do
see some small differences between the two samples. The “believe all or almost all” category had
a higher proportion of votes in 1998 than in 2002, while the “believes almost
nothing” and rating category 2 received a higher proportion of the 2002 votes.
To see
if any of the differences in these samples are statistically significant, we
will run a chi-square test:
H0; The population
distribution of ratings in 1998 is the same as in 2002
Ha: There is at least one
difference between the two population distributions
Running
the analysis:

We see that all of the expected
counts are at least 5. We also clearly
have two independent random samples (one from 2002 and one from 1998).
With such a small p-value (.003
< .05), we easily reject the null hypothesis and conclude that there has
been some change in the believability ratings between the two years. Looking at the cell count contributions, we
see that the biggest discrepancies were found in rating category 1 (4.874 and
4.927) with fewer of those votes in 2002 than expected (200 < 233.75) and
more observed votes in that category in 1998 than expected (265 > 231.25).
While this difference is
statistically significant, we can’t isolate the cause since this was an
observational study, but we are willing say there is a difference in the two
populations (of those who are willing to rate their local paper’s
believability) since we have random samples from each population.