Stat 322 – Review
II Solutions
updated 3/5, 9:15pm
1) (a) What are the characteristics of a data set that involves ANOVA as the inference procedures?
Comparing several means or one categorical and one
quantitative variable.
(b) Repeat (a) for regression.
Deciding if there is a relationship between two
quantitative variables.
(c) Define the terms: factor, interaction, levels, treatment, explanatory variable, response variable, predictor.
factor = explanatory variable = predictor; levels =
the outcomes of the factor; treatment = factor level combination
response variable = the one we are trying to
explain.
2) What is the relationship between Sxx and sx2?
Sxx = S(xi-
)2
sx2 =
Sxx/(n-1)
3) problem 3, p. 420. State the hypotheses in symbols and in words. Provide a complete ANOVA table.
H0: m1=m2=m3 (the mean output is the same for each brand)
Ha: at least one brand has a different
mean output.
|
source |
df |
SS |
MS |
F |
|
brand |
I-1=2 |
591.2 |
591.2/2=295.6 |
F=295.6/237.4=1.30 |
|
error |
23-2=21 |
4773.3 |
4773.3/21=227.3 |
|
|
total |
24-1=23 |
|
|
|
Compare F=1.30 to F(.05, 2, 21)=3.47
Since our observed test statistic is smaller than
the table value, our p-value < .05 and we fail to reject H0 at
the 5% level. In fact, F<F(.1,2,21) = 2.57 indicating that p-value > .10.
4) (a) problem 9 (p. 421) in Minitab. Remember to check and discuss the technical conditions for the validity of your procedure.
Analysis of Variance for thiamin
|
Source |
DF |
SS |
MS |
F |
P |
|
grain |
3 |
8.983 |
2.994 |
3.96 |
0.023 |
|
Error |
20 |
15.137 |
0.757 |
|
|
|
Total |
23 |
24.120 |
|
|
|
Since the p-value < .05, we would reject the
null hypothesis, and conclude there is a difference in the mean thiamin content
at least two of the grains. Histogram of residuals looks bizarre, but normal
probability plot is fine. Variances appear equal (1.04 vs. .669).
(b) Produce and discuss a visual display of these data.

Visually appears that barley and oats are above the
other two.
(b) If the data are statistically significant, perform Tukey’s multiple comparison to see which means are different at the 5% (overall) level. Produce an underscoring graph to show the significant and insignificant differences.
Individual
confidence level = 98.89%
type = Barley subtracted from:
type Lower Center
Upper
Maize -2.5064 -1.1000
0.3064
Oats -1.0231 0.3833
1.7898
Wheat -2.2898 -0.8833
0.5231
type = Maize subtracted from:
type Lower Center
Upper
Oats 0.0769 1.4833
2.8898
Wheat -1.1898 0.2167
1.6231
type = Oats subtracted from:
type Lower Center
Upper
Wheat -2.6731 -1.2667
0.1398
This indicates that only maize and oats differ.

(c) What is the individual error rate used by Minitab. How does this compare to a Bonferroni correction? Which procedure is more “conservative”?
Individual
error rate = 0.0111
A bonferroni correction would have used a/6 =
.0083 making this procedure more conservative (more likely to fail to reject).
(d) Explain the consequences of making the individual error rate smaller.
Making the individual error rate smaller makes the
confidence intervals larger and we are less likely to declare significant
differences.
5) problem 3, p. 454
General Linear Model: C1 versus C2,
C3
Factor Type
Levels Values
C2 fixed 4
1(200), 2(400), 3(700), 4(1100)
C3 fixed 4
1(190), 2(250), 3(300), 4(400)
Analysis of Variance for C1,
using Adjusted SS for Tests
Source DF Seq
SS Adj SS Adj MS
F P
C2 3
324082 324082 108027
105.31 0.000
C3 3
39934 39934 13311
12.98 0.001
Error 9
9232 9232 1026
Total 15
373248
With p-value ≈ 0 (F = 105.31), we reject the null
hypothesis of no gas-rate effect and conclude that at least one gas rate does
have a different mean heat transfer coefficient.
With a p-value = .001 < .01, we reject the null
hypothesis of no liquid-rate effect and conclude that at least one liquid rate
has a different mean heat transfer coefficient..
S = 32.0279 R-Sq = 97.53% R-Sq(adj) = 95.88%
Unusual Observations for C1
Obs C1
Fit
16
733.000 683.438 21.184
49.563 2.06 R
R denotes an observation
with a large standardized residual.
Tukey Simultaneous Tests
Response Variable C1
All Pairwise Comparisons
among Levels of C2
C2 = 1(200) subtracted from:
Difference SE of Adjusted
C2 of Means Difference
T-Value P-Value
2(400) 93.50 22.65
4.129 0.0113
3(700) 209.25 22.65
9.240 0.0000
4(1100) 381.50 22.65
16.845 0.0000
C2 = 2(400) subtracted from:
Difference SE of Adjusted
C2 of Means Difference
T-Value P-Value
3(700) 115.8 22.65
5.111 0.0029
4(1100) 288.0 22.65
12.717 0.0000
C2 = 3(700) subtracted from:
Difference SE of Adjusted
C2 of Means Difference
T-Value P-Value
4(1100) 172.3 22.65
7.606 0.0002
This indicates that, at the 1% level, the mean heat transfer
coefficient is the same for 200 and 400 but differ for the other levels (so we
would underscore 200 and 400 and that’s it).
Tukey Simultaneous Tests
Response Variable C1
All Pairwise Comparisons
among Levels of C3
C3 = 1(190) subtracted from:
Difference SE of Adjusted
C3 of Means Difference
T-Value P-Value
2(250) 45.50 22.65
2.009 0.2535
3(300) 82.50 22.65
3.643 0.0229
4(400) 136.25 22.65
6.016 0.0009
C3 = 2(250) subtracted from:
Difference SE of Adjusted
C3 of Means
Difference T-Value P-Value
3(300) 37.00 22.65
1.634 0.4085
4(400) 90.75 22.65
4.007 0.0134
C3 = 3(300) subtracted from:
Difference SE of Adjusted
C3 of Means Difference
T-Value P-Value
4(400) 53.75 22.65
2.373 0.1522
This indicates that only 190 and
400 differ (we could just underscore 250, 300, and 400).
6) problem 14, p. 456
E(
i.-
..) = E(
i.) - E(
..)
E(
i.) = 1/J E(Sj m + ai + bj) =1/J( Jm + Jai + 0 ) = m + ai
E(
..)=1/IJE(SiSj m + ai + bj) = 1/IJ(IJ m + 0 +
0) = m
E(
i.) - E(
..) = m + ai - m = ai
7) problem 19, p. 464. Be very
explicit about the hypothesis statements and check of technical conditions.
Analysis of Variance
Source DF
SS MS F
P
coal 2
1.00241 0.501206 29.49
0.000
NaOH 2
0.12431 0.062156 3.66
0.069
Interaction 4
0.01456 0.003639 0.21
0.924
Error 9
0.15295 0.016994
Total 17
1.29423
(a) With
p-value = .924 we fail to reject H0: no interaction between coal
type and NaOH concentration in favor of Ha: there is an interaction.
Thus we can examine the main effects. With p-value
=.000 we do reject H0: m1=m2=m3 in favor of Ha: the mean acidity does differ for
at least one coal type. With p-value = .069 we fail to reject H0: b1=b2=b3=0 (at the .01 level) and conclude that the NaOH
concentration doesn't make a difference.
Examining residuals plots: There is some concern
about the technical conditions, though that is mainly due to two outliers.
(b) Could
use the two-way ANOVA for (a) but here need to use General Linear Model
Tukey Simultaneous Tests
Response Variable acidity
All Pairwise Comparisons
among Levels of Coal
coal = Maddingley subtracted from:
Difference SE of Adjusted
coal of Means
Difference T-Value P-Value
morwell 0.2117
0.07526 2.812 0.0485
yallourn 0.5717 0.07526
7.595 0.0001
coal = morwell subtracted from:
Difference SE of Adjusted
C2 of Means
Difference T-Value P-Value
yallourn 0.3600 0.07526
4.783 0.0026
If we use a=.01, then we have that coal type
1 and 2 differ, and coal types 2 and 3 differ.
8) problem 19, p. 518
(a) The regression equation is NOx-hat = -
45.6 + 1.71 burner
(b)
= -45.6 + 1.71(225) = 339.15
(c)
-50(1.71) = -85.57
(d) No,
the value 500 is too far outside the range of x values used to construct the
regression line.
9) problem 33, p. 528
(a) CI for
b1: .10748 + (t25,.025=2.06)
(.0128) = (.081, .134). We are 95% confident that the true average change in
strength associated with a 1 Gpa increase in modulus of elasticity is between
.081 MPa and .134 MPa.
(b) Since
.01 is contained in the interval we know we would fail to reject a two-sided
alternative at the 5% level. But since they said "at most" we really
want a one-sided alternative.
H0:
b1=.1 (at most .1)
Ha:
b > .1 (more than .1)
t=(.10748-.1)/.0128
= .58, table A.5 gives p-value > .10, thus we fail to reject H0,
there is not enough evidence to contradict the prior belief.
10) problem 52, p. 537
The regression equation is y = 6.45 +
10.6 x
|
Predictor |
Coef |
SE Coef |
T |
P |
|
Constant |
6.449 |
2.795 |
2.31 |
0.054 |
|
x |
10.6026 |
0.9985 |
10.62 |
0.000 |
S = 2.546 R-Sq = 94.2% R-Sq(adj) = 93.3%
Analysis of Variance
|
Source |
DF |
SS |
MS |
F |
P |
|
Regression |
1 |
730.69 |
730.69 |
112.76 |
0.000 |
|
Residual |
7 |
45.36 |
6.48 |
|
|
|
Total |
8 |
776.06 |
|
|
|
(a) With a p-value of .000 (F = 112.76), we would
strongly reject H0: b1 =0 indicating that the model does specify a useful
relationship.
(b) Estimate for b1: 10.6026 + (t7,.025=2.365)(.9985) =
(8.24, 12.96)
(c), (d) Estimate for
(x=3):
New Obs Fit SE
Fit 95.0% CI 95.0% PI
1 38.256 0.911 ( 36.100, 40.413) ( 31.858, 44.655)
(e)
=2.67. Since 2.5 is closer to
this will shrink the intervals.
(f) No, the value of 6.0 is not in the range of
observed x values, therefore predicting at that point is meaningless.
11) problem 71, p. 550
(a)
R-square from SAS output: .5073
(b)
correlation coefficient = sqrt(.5073) = .7122
(c)
Overall F=15.44 with p-value=.0013<.01, so there is evidence the model is
useful.
(d)
= .787218 + .00757(50) = 1.166
+ (t15,.025=2.131)(s=.20308)sqrt(1/17+(
=42.31-50)2/16(26.382))
1.166 +
2.131 (.20308) sqrt(1/17 + .0053)
1.166 +
2.131(.20308)(.253) = 1.166 + .1096 = (1.006, 1.276)
(e)
= .787218 + .00757(30) = 1.014
observed =
.80
residual =
-.214
12) problem 67, p. 627
(a) For a one-minute increase in the 1-mile walk time, we
would expect the VO2max to
decrease by .0996, while keeping the other predictor
variables fixed.
(b) We would expect male to have an increase of .6566
in VO2max over females, while
keeping the other predictor variables fixed.
(c)
= 3.5959 +.6566(1)+.0096(170)-.0996(11) -.0880(140) = 3.67 . The residual is (3.15 -3.67) = -.52.
(d) R2 = 1-SSE/SST = 1-30.1033/102.3922 = .706 or
70.6% of the observed variations in
VO2max can be attributed to the model relationship.
(e) H0: b1 = b2 = b3 = b4 = 0
Ha: at least one bi ≠ 0
F = (.706/4) / (1-.706)/15 = 9.005. This is greater than F.05, 4, 15 =
8.25 so we reject H0 and conclude the model specifies a useful
relationship between VO2max and at least one of the other predictors.
13) problem 70, p. 628, interpret the coefficient of x3
as well.
(a) For the model excluding the interaction term, R2
= 1- 5.18/8.55 = .394, or 39.4% of the observed
variation in lift/drag ratio can be explained by the model without the
interaction accounted for. However, including the interaction term increases
the amount of variation in lift/drag ratio that can be explained by the model to
R2 = 1-3.07/8.55 = .641 or 64.1%.
(b) Without interaction, we are testing : H0: b1 = b2 = 0 vs. Ha: either b1 or b2 ≠ 0.
The test statistic is F = R2/k/ (1-R2)/(n-k-1).
F = 1.95 < F.05, 2, 26 = 5.14, so fail to
reject, model does not appear to be useful.
With the interaction term, we are testing H0: b1 = b2 = b3 = 0 vs. Ha: at least one bi ≠ 0
F = (.641/3) / ((1-.641)/5 = 2.98, so again fail to reject.
Even with the interaction term, there is not enough of a
significant relationship between lift/drag ratio and the two predictor
variables to make the model useful (a bit of a surprise!)
14) problem 75, p. 629
(a) H0: b1 = b2 = 0
Ha: at least one bi ≠ 0
Analysis of
Variance
Source DF SS
MS F P
Regression 2
237.52 118.76 30.81
0.000
Residual
Error 7 26.98
3.85
Total 9
264.50
Reject null hypothesis (F = 30.31,
p-value < .001) and conclude quadratic model is useful.
(b)
Predictor Coef
SE Coef T P
Constant 41.7422 0.8522
48.98 0.000
log added Mn 6.581
1.002 6.57 0.000
log added Mn
sq -2.3621 0.3073
-7.69 0.000
The quadratic predictor is
significant (t = -7.69, p-value < .001) and should be retained in the
model.
(c)
New
Obs Fit
SE Fit 90% CI 90% PI
1
45.961 1.031 (44.007, 47.915) (41.760, 50.162)
Values of
Predictors for New Observations
log
log
New added
added
Obs Mn
Mn sq
1
1.00 1.00
We are 90% confident that the
expected height for wheat treated with 10 mM of Mn is between 44.007 cm and
47.915 cm.
45.961
+ 1.895(1.031) where t.05, 7 = 1.895