Stat 322 - Review II Draft –
last updated March 3
Format of Exam: The exam will be 50 minutes. The exam will be open book, class notes, calculator and Minitab. Remember, “open book” means you still need to organize your notes and highlight key ideas for quick reference. You may be given computer output (to interpret and/or to fill in missing pieces) but you may also be expected to complete regression and ANOVA calculations using Minitab. You will not need to print out Minitab output but will need to be clear what output/pictures you are seeing in Minitab. There will be a mixture of calculation and interpretation questions, including “what if” questions. The exam will cover Ch. 10, Sec. 11.1-11.2, Chs. 12, Sec. 13.1-13.4 (HW 4-7, Quizzes 3-5, writing assignments).
One-Way Analysis of Variance (H0: m1=m2=… = mI)
· Be able to state hypotheses in symbols and in words
· Be able to produce and interpret “stacked” boxplots
· Be able to stack and check the technical conditions of the ANOVA procedure
· Be able to suggest, carry out, and evaluate transformations of response variable
· Understand what the test statistic in ANOVA is measuring
· what is the ratio comparing and what does this tell you?
· Be able to complete an ANOVA table given partial information
Degrees of freedom add, Sum of squares add, MS=SS/df, F=MS/MSE
Including interpretation of F and/or p-value to assess significance
· Be able to think in terms of model equations, e.g., Xij = mi + eij = m + ai + eij; H0: ai=0
Note: Section 10.1 assumes equal sample size, 10.3 allows unequal sample sizes. Stat > ANOVA > Oneway command doesn’t care.
Multiple Comparisons (Tukey’s Method)
· Be able to carry out the procedure (most applicable when ANOVA F is significant)
· Understand the Bonferroni correction for multiple intervals
· Understand difference between “individual” and “overall/experimentwise” error rate
· Be able to produce and interpret Minitab output
including ordering and underscoring the sample means
Two-Way Analysis of Variance Xij = m + ai + bj (+ gij) + eij
· Be able to state hypotheses in symbols and in words
· Understand advantages of two-way anova vs. one-way anova
· Know what a blocking design is and what it’s main advantages are
· Randomized block designs, matched-pairs designs vs. completely randomized design
· If have repeat observations and no blocking factors, be able to test for interactions
Produce and interpret interaction plot (including information about main effects)
Check for interactions before interpret main effects
Know what is meant by an “additive” model
·
Be able to calculate main effects,
e.g.,
, fitted values and residuals
· Be able to complete an ANOVA table given partial information
df(total)=N-1, df(A)=I-1, df(B)=J-1, df(AB)=(I-1)(J-1), df(error) = whatever is left over.
· Be able to perform and analyze Minitab output (Stat > ANOVA > Two-way if balanced)
For unbalanced designs, multiple comparisons: Stat > ANOVA > General Linear Model
Enter Model: c2|c3 if want to use c2 and c3 as two factors and their interaction. In Comparisons window, enter the significant factors in the “terms” box. You can go with either the confidence interval output and/or the test output (p-values).
Simple and Multiple Regression Yij = b0+b1xi+eij
· Produce and interpret scatterplot (direction, strength, linearity, outliers, influential obs)
· How are residuals and potentially influential observations identified?
· Associate vs. causation
· Understand the Basic Regression Model
· E(Y|x)= b0+b1x (it’s the mean value of Y, not Y itself, that is a linear function of x)
· V(Y|x)=s2 (homogeneity)
·
Distribution of Y at each x
is
· Understand the principle of least squares estimation
· Be able to produce and interpret Minitab regression output (Stat > Regression > Regression) for any number of predictors
· Be able to interpret regression coefficients in context (intercept is predicted value when x’s=0, slope is average/expected change in y for a one-unit increase in x, all else constant)
· simple vs. multiple regression
· Calculate fitted values and residuals
· Be wary of extrapolating to make predictions far outside given range of x values.
· Be able to interpret R2 and recognize when the question is asking you to report R2
· Recognize s as a point estimate for s and how to interpret s
· Be able to interpret a Model Utility Test to judge if the model is useful (F value, hypotheses)
· Be able to carry out a test of significance to determine if bi (and thus the relationship) is significant as well as construct a confidence interval for the slope
for any hypothesized value of slope coefficient (e.g., H0: bi=1)
one or two-sided alternative
degrees of freedom = n-(k+1)
understand properties of sampling
distribution of
, especially what SD(
) represents (GPA vs. study hours simulation) and factors
that affect its size (writing assignment)
· Be able to state and check technical conditions (be very clear which graph tells you what)
· Be able to make inferences for predicted values
· Be able to recognize whether a confidence interval (mean value) or a prediction interval (individual value) is being sought. Understand the different properties of these two intervals. Be able to use Minitab/Minitab output to make prediction and interval.
· Know and be able to apply properties of correlation coefficient (see p. 530)
· Be able to recognize the need for transformations, to suggest appropriate transformations, to compare transformed models for adequacy (comparing residual plots, R2, s), to back transform predictions and intervals
· Understand how the adjusted R2 works and why it might be useful
· Be able to include indicator variables, quadratic, and interaction terms in regression model
including interpretation of their coefficients
· Be able to read output from a nonMinitab package.
Also make sure you can:
· Define variables
· Identify response and explanatory variables
· Relate conclusions to context
· Know when to wear hats and when not to!
Overview of Procedures
Several independent samples |
Means |
Proportions* |
|
Graphical summary |
As above but on same scale |
Segmented Bar graph (excel) |
|
Numerical summary |
|
|
|
Inference Procedure |
ANOVA |
Chi-square |
|
Minitab |
Stat > ANOVA > One-way stacked or unstacked |
Stat > Tables > Chi-Square Test |
|
Null hypothesis |
H0: m1=m2 = …=mI |
H0: p1= p2 = … = pI |
|
Alternative hypothesis |
Ha: at least one m differs |
Ha: at least one p differs |
|
Test Statistic |
F=MST/MSE df=I-1, N-I |
expected=row total´col total table total df = (#rows-1)(#cols-1) |
|
Technical Conditions |
- Independent SRSs - Populations normal (plot each sample or large n’s) - Variances are equal (smax/smin<2) With two-way ANOVA can analyze residuals instead |
- all expected counts> 5 - data are independently chosen random samples or randomized experiment |
|
Follow up Analysis |
Tukey’s Multiple Comparisons
|
chi-square sum contributions |
Note: For a matched pairs design with quantitative data, we take the differences and perform a one-sample t-test on the differences (and check the technical conditions on the differences).
· is there a relationship between gender and which tire they picked (chi-square)
· is there a relationship between height and how long someone long-jumps (regression)
|
|
Both Categorical* |
Both Quantitative
|
|
Graphical summary |
Segmented bargraph |
Scatterplot |
|
Numerical summary |
Conditional proportions |
Correlation coefficient, r |
|
Procedure |
Chi-Square
|
Regression |
|
Minitab |
Stat>Tables> Chi-square Test |
Stat > Regression > Regression |
|
Null hypothesis |
H0: no relationship between variable 1 and variable 2 |
H0: no relationship (b=0) between variable 1 and variable 2 or H0: b=hypothesized value |
|
Test Statistic |
expected=row total´col total table total df = (#rows-1)(#cols-1) |
df=n-2 |
|
Confidence interval |
N/A |
For b: b + tn-2 SEb |
|
Follow-up Analysis |
|
Confidence interval: Prediction interval:
|
|
Technical conditions |
- all expected counts> 5 (p. 560) - data are simple random sample classified by two categorical variables |
- Linear relationship (resids vs. x or fits) - Independent observations - Normality of response at each x value (prob plot/hist of resids) - Equal variance of y at each x value (resids vs. x) |
*Not on Exam 2