Stat 322 – Review II Solutions

updated 3/5, 9:15pm

 

1) (a) What are the characteristics of a data set that involves ANOVA as the inference procedures?

Comparing several means or one categorical and one quantitative variable.

(b) Repeat (a) for regression.

Deciding if there is a relationship between two quantitative variables.

(c) Define the terms: factor, interaction, levels, treatment, explanatory variable, response variable, predictor.

factor = explanatory variable = predictor; levels = the outcomes of the factor; treatment = factor level combination

response variable = the one we are trying to explain.

 

2) What is the relationship between Sxx and sx2?

Sxx = S(xi-)2

sx2 = Sxx/(n-1)

 

3) problem 3, p. 420.  State the hypotheses in symbols and in words.  Provide a complete ANOVA table.

H0: m1=m2=m3 (the mean output is the same for each brand)

Ha: at least one brand has a different mean output.

source

df

SS

MS

F

brand

I-1=2

591.2

591.2/2=295.6

F=295.6/237.4=1.30

error

23-2=21

4773.3

4773.3/21=227.3

 

total

24-1=23

 

 

 

Compare F=1.30 to F(.05, 2, 21)=3.47

Since our observed test statistic is smaller than the table value, our p-value < .05 and we fail to reject H0 at the 5% level. In fact, F<F(.1,2,21) = 2.57 indicating that p-value > .10.

 

4) (a) problem 9 (p. 421) in Minitab. Remember to check and discuss the technical conditions for the validity of your procedure.

Analysis of Variance for thiamin

Source

DF

SS

MS

F

P

grain

3

8.983

2.994

3.96

0.023

Error

20

15.137

0.757

 

 

Total

23

24.120

 

 

 

Since the p-value < .05, we would reject the null hypothesis, and conclude there is a difference in the mean thiamin content at least two of the grains. Histogram of residuals looks bizarre, but normal probability plot is fine. Variances appear equal (1.04 vs. .669).

(b) Produce and discuss a visual display of these data.

Visually appears that barley and oats are above the other two.

(b) If the data are statistically significant, perform Tukey’s multiple comparison to see which means are different at the 5% (overall) level.  Produce an underscoring graph to show the significant and insignificant differences.

Individual confidence level = 98.89%

 

type = Barley           subtracted from:

 

type                Lower   Center   Upper

Maize             -2.5064  -1.1000  0.3064

Oats              -1.0231   0.3833  1.7898

Wheat             -2.2898  -0.8833  0.5231

 

type = Maize            subtracted from:

 

type                Lower  Center   Upper

Oats               0.0769  1.4833  2.8898

Wheat             -1.1898  0.2167  1.6231

 

type = Oats             subtracted from:

 

type                Lower   Center   Upper

Wheat             -2.6731  -1.2667  0.1398

 

This indicates that only maize and oats differ.

(c) What is the individual error rate used by Minitab. How does this compare to a Bonferroni correction? Which procedure is more “conservative”?

Individual error rate = 0.0111

A bonferroni correction would have used a/6 = .0083 making this procedure more conservative (more likely to fail to reject).

(d) Explain the consequences of making the individual error rate smaller.

Making the individual error rate smaller makes the confidence intervals larger and we are less likely to declare significant differences.

 

5) problem 3, p. 454

 

General Linear Model: C1 versus C2, C3

 

Factor  Type   Levels  Values

C2      fixed       4  1(200), 2(400), 3(700), 4(1100)

C3      fixed       4  1(190), 2(250), 3(300), 4(400)

 

 

Analysis of Variance for C1, using Adjusted SS for Tests

 

Source  DF  Seq SS  Adj SS  Adj MS       F      P

C2       3  324082  324082  108027  105.31  0.000

C3       3   39934   39934   13311   12.98  0.001

Error    9    9232    9232    1026

Total   15  373248

 

With p-value ≈ 0 (F = 105.31), we reject the null hypothesis of no gas-rate effect and conclude that at least one gas rate does have a different mean heat transfer coefficient.

 

With a p-value = .001 < .01, we reject the null hypothesis of no liquid-rate effect and conclude that at least one liquid rate has a different mean heat transfer coefficient..

 

S = 32.0279   R-Sq = 97.53%   R-Sq(adj) = 95.88%

 

 

Unusual Observations for C1

 

Obs       C1      Fit  SE Fit  Residual  St Resid

 16  733.000  683.438  21.184    49.563      2.06 R

 

R denotes an observation with a large standardized residual.

 

Tukey Simultaneous Tests

Response Variable C1

All Pairwise Comparisons among Levels of C2

C2 = 1(200)  subtracted from:

 

         Difference       SE of           Adjusted

C2         of Means  Difference  T-Value   P-Value

2(400)        93.50       22.65    4.129    0.0113

3(700)       209.25       22.65    9.240    0.0000

4(1100)      381.50       22.65   16.845    0.0000

 

 

C2 = 2(400)  subtracted from:

 

         Difference       SE of           Adjusted

C2         of Means  Difference  T-Value   P-Value

3(700)        115.8       22.65    5.111    0.0029

4(1100)       288.0       22.65   12.717    0.0000

 

 

C2 = 3(700)  subtracted from:

 

         Difference       SE of           Adjusted

C2         of Means  Difference  T-Value   P-Value

4(1100)       172.3       22.65    7.606    0.0002

 

This indicates that, at the 1% level, the mean heat transfer coefficient is the same for 200 and 400 but differ for the other levels (so we would underscore 200 and 400 and that’s it).

 

Tukey Simultaneous Tests

Response Variable C1

All Pairwise Comparisons among Levels of C3

C3 = 1(190)  subtracted from:

 

        Difference       SE of           Adjusted

C3        of Means  Difference  T-Value   P-Value

2(250)       45.50       22.65    2.009    0.2535

3(300)       82.50       22.65    3.643    0.0229

4(400)      136.25       22.65    6.016    0.0009

 

 

C3 = 2(250)  subtracted from:

 

        Difference       SE of           Adjusted

C3        of Means  Difference  T-Value   P-Value

3(300)       37.00       22.65    1.634    0.4085

4(400)       90.75       22.65    4.007    0.0134

 

 

C3 = 3(300)  subtracted from:

 

        Difference       SE of           Adjusted

C3        of Means  Difference  T-Value   P-Value

4(400)       53.75       22.65    2.373    0.1522

 

This indicates that only 190 and 400 differ (we could just underscore 250, 300, and 400).

 

6) problem 14, p. 456

E(i.- ..) = E(i.) - E(..)

E(i.) = 1/J E(Sj m + ai + bj) =1/J( Jm + Jai + 0 ) = m + ai

E(..)=1/IJE(SiSj m + ai + bj) = 1/IJ(IJ m + 0 + 0) = m

E(i.) - E(..) = m + ai - m = ai

 

7) problem 19, p. 464. Be very explicit about the hypothesis statements and check of technical conditions.

Analysis of Variance

Source       DF       SS        MS      F      P

coal          2  1.00241  0.501206  29.49  0.000

NaOH          2  0.12431  0.062156   3.66  0.069

Interaction   4  0.01456  0.003639   0.21  0.924

Error         9  0.15295  0.016994

Total        17  1.29423

 

 (a) With p-value = .924 we fail to reject H0: no interaction between coal type and NaOH concentration in favor of Ha: there is an interaction.

Thus we can examine the main effects. With p-value =.000 we do reject H0: m1=m2=m3 in favor of Ha: the mean acidity does differ for at least one coal type. With p-value = .069 we fail to reject H0: b1=b2=b3=0 (at the .01 level) and conclude that the NaOH concentration doesn't make a difference.

Examining residuals plots: There is some concern about the technical conditions, though that is mainly due to two outliers.

(b) Could use the two-way ANOVA for (a) but here need to use General Linear Model

Tukey Simultaneous Tests

Response Variable acidity

All Pairwise Comparisons among Levels of Coal

coal = Maddingley  subtracted from:

 

    Difference       SE of           Adjusted

coal  of Means  Difference  T-Value   P-Value

morwell  0.2117     0.07526    2.812    0.0485

yallourn 0.5717     0.07526    7.595    0.0001

 

coal = morwell  subtracted from:

 

    Difference       SE of           Adjusted

C2    of Means  Difference  T-Value   P-Value

yallourn 0.3600     0.07526    4.783    0.0026

 

If we use a=.01, then we have that coal type 1 and 2 differ, and coal types 2 and 3 differ.

 

8) problem 19, p. 518

(a) The regression equation is NOx-hat = - 45.6 + 1.71 burner

(b) = -45.6 + 1.71(225) = 339.15

(c) -50(1.71) = -85.57

(d) No, the value 500 is too far outside the range of x values used to construct the regression line.

 

9) problem 33, p. 528

(a) CI for b1: .10748 + (t25,.025=2.06) (.0128) = (.081, .134). We are 95% confident that the true average change in strength associated with a 1 Gpa increase in modulus of elasticity is between .081 MPa and .134 MPa.

(b) Since .01 is contained in the interval we know we would fail to reject a two-sided alternative at the 5% level. But since they said "at most" we really want a one-sided alternative.

H0: b1=.1 (at most .1)

Ha: b > .1 (more than .1)

t=(.10748-.1)/.0128 = .58, table A.5 gives p-value > .10, thus we fail to reject H0, there is not enough evidence to contradict the prior belief.

 

10) problem 52, p. 537

The regression equation is y = 6.45 + 10.6 x

Predictor

Coef

SE Coef

T

P

Constant

6.449

2.795

2.31

0.054

x

10.6026

0.9985

10.62

0.000

 

S = 2.546 R-Sq = 94.2% R-Sq(adj) = 93.3%

 

Analysis of Variance

Source

DF

SS

MS

F

P

Regression

1

730.69

730.69

112.76

0.000

Residual

7

45.36

6.48

 

 

Total

8

776.06

 

 

 

(a) With a p-value of .000 (F = 112.76), we would strongly reject H0: b1 =0 indicating that the model does specify a useful relationship.

(b) Estimate for b1: 10.6026 + (t7,.025=2.365)(.9985) = (8.24, 12.96)

(c), (d) Estimate for  (x=3):

New Obs     Fit         SE Fit      95.0% CI          95.0% PI

1           38.256      0.911 ( 36.100, 40.413) ( 31.858, 44.655)

(e) =2.67. Since 2.5 is closer to  this will shrink the intervals.

(f) No, the value of 6.0 is not in the range of observed x values, therefore predicting at that point is meaningless.

11) problem 71, p. 550

(a) R-square from SAS output: .5073

(b) correlation coefficient = sqrt(.5073) = .7122

(c) Overall F=15.44 with p-value=.0013<.01, so there is evidence the model is useful.

 

(d) = .787218 + .00757(50) = 1.166

+ (t15,.025=2.131)(s=.20308)sqrt(1/17+(=42.31-50)2/16(26.382))

1.166 + 2.131 (.20308) sqrt(1/17 + .0053)

1.166 + 2.131(.20308)(.253) = 1.166 + .1096 = (1.006, 1.276)

 

(e) = .787218 + .00757(30) = 1.014

observed = .80

residual = -.214

 

12) problem 67, p. 627

(a) For a one-minute increase in the 1-mile walk time, we would expect the VO2max to

decrease by .0996, while keeping the other predictor variables fixed.

(b) We would expect male to have an increase of .6566 in VO2max over females, while

keeping the other predictor variables fixed.

(c)  = 3.5959 +.6566(1)+.0096(170)-.0996(11) -.0880(140) = 3.67 . The residual is (3.15 -3.67) = -.52.

(d) R2  = 1-SSE/SST = 1-30.1033/102.3922 = .706 or 70.6% of the observed variations in

VO2max can be attributed to the model relationship.

(e) H0: b1 = b2 = b3 = b4 = 0

Ha: at least one bi ≠ 0

F = (.706/4) / (1-.706)/15 = 9.005.  This is greater than F.05, 4, 15 = 8.25 so we reject H0 and conclude the model specifies a useful relationship between VO2max and at least one of the other predictors.

 

13) problem 70, p. 628, interpret the coefficient of x3 as well.

(a) For the model excluding the interaction term, R2 = 1- 5.18/8.55 =  .394, or 39.4% of the observed variation in lift/drag ratio can be explained by the model without the interaction accounted for. However, including the interaction term increases the amount of variation in lift/drag ratio that can be explained by the model to R2 = 1-3.07/8.55 = .641 or 64.1%.

 

(b) Without interaction, we are testing : H0: b1 = b2 = 0 vs. Ha: either b1 or b2 ≠ 0.

The test statistic is F = R2/k/ (1-R2)/(n-k-1).

F = 1.95 < F.05, 2, 26 = 5.14, so fail to reject, model does not appear to be useful.

With the interaction term, we are testing H0: b1 = b2 = b3 = 0 vs. Ha: at least one bi ≠ 0

F = (.641/3) / ((1-.641)/5 = 2.98, so again fail to reject.

Even with the interaction term, there is not enough of a significant relationship between lift/drag ratio and the two predictor variables to make the model useful (a bit of a surprise!)

 

14) problem 75, p. 629

(a) H0: b1 = b2 = 0

Ha: at least one bi ≠ 0

Analysis of Variance

 

Source          DF      SS      MS      F      P

Regression       2  237.52  118.76  30.81  0.000

Residual Error   7   26.98    3.85

Total            9  264.50

 

Reject null hypothesis (F = 30.31, p-value < .001) and conclude quadratic model is useful.

 

(b)

Predictor           Coef  SE Coef      T      P

Constant         41.7422   0.8522  48.98  0.000

log added Mn       6.581    1.002   6.57  0.000

log added Mn sq  -2.3621   0.3073  -7.69  0.000

 

The quadratic predictor is significant (t = -7.69, p-value < .001) and should be retained in the model.

 

(c)

New

Obs     Fit  SE Fit       90% CI            90% PI

  1  45.961   1.031  (44.007, 47.915)  (41.760, 50.162)

 

Values of Predictors for New Observations

 

       log    log

New  added  added

Obs     Mn  Mn sq

  1   1.00   1.00

 

We are 90% confident that the expected height for wheat treated with 10 mM of Mn is between 44.007 cm and 47.915 cm.

            45.961 + 1.895(1.031) where t.05, 7 = 1.895