Stat 321 – Final Review

 

Final Exam Times: Monday, 10:10-1pm, Building 35, Room 111B (lab location)

 

Final Exam Format: This is a closed book exam. You may bring three 8½ x 11 pages (front and back).  You will turn in these pages with the exam.  I will also supply tables and formulas (samples posted on the web). The exam will be cumulative but will emphasize more recent material. There will be a mixture of interpretation and calculation questions. See me for unclaimed graded assignments (quizzes, labs, exams). I especially encourage review of “big picture” ideas from labs.

 

Additional Review Problems: p. 212: 43, p. 218: 53, 54, p. 222: 65, p. 224: 80, p. 225: 83, p. 242: 15 (hint: find E(SXi2), how would you adjust for its bias), p. 251: 20, p. 269: 23 Ans: (.446, .851) with simple formula; p. 281: 48, p. 281: 52(a),(c), 53, suppose you have a standard beta random variable with a = 2 but b unknown.  Derive the method of moments estimator for b. 

 

From Chapter 5 (Combinations of Random Variables) you should:

- Sampling Distributions of Sample Statistics (functions of random variables)

            Be able to distinguish (in words and with symbols) between sample statistic vs. parameter (of population or of probability distribution)

                        Understand what is random and what is constant (though perhaps unknown)

            Understand the definition of (simple) random sample (SRS)

            Understand what is meant by “sampling distribution of a statistic”

            Know how to derive the exact sampling distribution of a statistic for small sample spaces (e.g., X discrete, n = 2)

            Understand how we approximate the sampling distribution of a statistic/estimator using simulation

            Be able to interpret a “repeated sampling” simulation done in Minitab

            Eat, drink, and breathe the Central Limit Theorem

Sampling Distribution of: E()=m  (always true), SD()=s / (if independent rv’s), normal (if population normal or approximately normal if n>30)

Sampling Distribution of : E(=X/n)=p (always true when X is binomial), SD()= (if random sample), approximately normal (if np>10 and n(1-p)>10)

            Can pay less attention to the distribution of the sum or count (convert to ,)

Be able to sketch the theoretical sampling distribution (scale and label the horizontal axis).

Be able to apply the formulas for expected value and variance of linear combination of (independent) random variables (p. 105-6, p. 219).

Know distribution is normal if individual random variables are and how to calculate probabilities (p. 221).

Be able to calculate probabilities for sample statistics (e.g., probability we find a >.30 when p=.2) and how these probabilities are affected by changes in sample size.

            Be able to apply the empirical rule

            Be able to interpret these probability statements in context.  Be able to make decisions about “claimed” values of the parameter based on this probability.

 

From Chapter 6 (Point Estimation) you should:

            Understand the difference between an estimator and an estimate

Know how to determine whether an estimator is unbiased by determining its expected value

            or by looking at the average of simulated values of the estimate

If choosing an estimator, pick among the unbiased estimators first

            What does “unbiased” mean? Why is this a desirable property of an estimator?  Always?

            Know how to find the variance (formula) of an estimator (e.g., using rules for variances)

            Understand the definition of bias in terms of overall systematic tendency

            Be able to compare performance of estimators using Minitab simulation results

            Understand the idea of method of moments and maximum likelihood estimators

Know how to determine estimators by method of moments and maximum likelihood (MLE)

 

From Chapter 7 (Confidence Intervals) you should:

            Understand the principle of confidence intervals

                        General form: estimate + (critical value)(standard error of estimate)

                        margin of error = half-width of interval

                                    measures amount of random sampling variability (only)

                        specifies range of plausible values for parameter based on sample statistic

            Know how to interpret “confidence” in your own words

                        without using the words “confidence” or “sure” or “chance” or “probability of parameter”… (lab 7)

Understand limitations, misinterpretations of confidence intervals

Know how intervals are affected by changes in sample size, confidence level, population size, parameter value

            Know how to calculate confidence intervals for population mean, population proportion

            Know the technical conditions for each method and how to check them

                        CI for m :  + ta/2,n-1s/ if SRS; n>30 or pop normal (normal prob plot) -- t-interval

                        PI:  + ta/2,n-1s if SRS; pop normal

                        CI for p:  + za/2 , if SRS; n> 10, n(1-)>10 -- Wald

                                       (95%)  + 1.96 where p*=(X+2)/(n+4), if SRS --   Adjusted Wald

Be able to determine necessary sample size n to ensure desired width or half-width for given confidence level

            Know how and when to calculate a prediction interval for an individual value (recognize language asking for this)

 

Earlier material to be especially aware of:

            Describing distributions of data numerically and graphically (and in context)

What is probability? What is a random variable?

            What is the expected value of a random variable? Standard deviation?

            What is a pmf, cdf, pdf? How do I graph them? How do I express the function for all x?

            Being able to identify the appropriate discrete probability distribution.

Calculating probabilities (including tables) and expected value involving known  

    continuous and discrete distributions (e.g., binomial, normal) and generic distributions.

            Normal approximation to binomial, binomial approximation to hypergeometric

            Independence            

Conditional probability          

            The distinction between “data” and “model” and between a distribution of data and a probability distribution

 

SOME PROBLEM SOLVING STRATEGIES

If you are asked to determine a probability

1. Are you finding a conditional or an unconditional probability?

2. If it involves  or , can you use the central limit theorem to state the probability distribution?

3. If it involves a random variable which follows one of the common probability distributions (e.g., binomial, gamma) then use the formulas page and/or tables.

Look for the phrase “approximate probability” in case one of the approximations (e.g., normal to binomial, poisson to binomial) might apply.

You may have to recognize if it belongs to a known discrete probability family yourself.

4. If it involves a random variable but you are given the pmf or pdf, determine the probability directly (summing or integrating).

5. If it does not involve a random variable, use techniques from chapter 2 (permutations, combinations, addition rule, multiplication rule). Make sure you are not applying any results without checking assumptions first (e.g., mutually exclusive, independent).

6. If you are told only a situation, you could be asked to perform or interpret a simulation to determine empirical probabilities.

 

If you are asked to determine the expected value of an expression

1. If the expression is a linear function of random variable(s), first simplify using the rules for expected value

            e.g., E(2X+3Y) = 2E(X)+3E(Y)

2. Once you get to E(rv), is the rv a sample mean () or a sample proportion ()?  If so, then E()=m = E(X) or E() = p

3. Once you get to E(Y), is Y a random variable from a common probability distribution family? If so, use the formulas page to determine the expected value

            e.g., if Y is a binomial random variable, E(Y)=np

4. If Y is not from a common probability distribution family, determine E(Y) or E(h(Y)) directly given the pmf (summing) or pdf (integrating)

            discrete: E(Y) = SyP(Y=y)      E(h(Y)) = Sh(y)P(Y=y)

            continuous: E(Y) = òyf(y)dy    E(h(Y)) = òh(y)f(y)dy

 

If you are asked to determine the variance of an expression

1. If the expression is a linear function of random variable(s), first simplify using the rules for variance

            e.g., V(2X+3Y) = 4V(X)+9V(Y) if X and Y are independent

2. Once you get to V(rv) is the rv a sample mean () or a sample proportion ()?  If so, then V()=s2/n = V(X)/n or V() = p(1-p)/n

3. Once you get to V(Y), is Y a random variable from a common probability distribution family? If so, use the formulas page to determine the expected value

            e.g., if Y is a binomial random variable, V(Y)=np(1-p)

4. If Y is not from a common probability distribution family, determine V(Y) directly given the pmf or pdf using V(Y)=E(Y2)-[E(Y)]2

            discrete: E(Y2) = Sy2P(Y=y)               continuous: E(Y2) = òy2f(y)dy

 

Note: the previous two sections discuss finding a “mean” or a “standard deviation.”  Remember, you could also be given a set of data and asked to use the techniques from chapter 1 to find  and s.

 

If you are asked to compute a confidence interval

0. Define the parameter in words (e.g., let p=proportion of all Cal Poly students who…)

1. Is it for a population mean or a population proportion?

            If a population mean, are you told a value of s? (if yes use z, otherwise use t)

2. Check the technical conditions to see if our formulas are valid

3. Calculate the interval and write a one sentence summary (e.g., “I’m 95% confident that” being clear what the parameter is you are estimating)

4. Be able to interpret the phrase “confidence” in your own words if asked (“95% of intervals..”)

5. Know what factors affect the behavior of the confidence intervals (width, midpoint, coverage)

 

·         Know the technical conditions required by different procedures and how to check them.

·         Be able to make interpretations and explanations of your calculations

·         Remember to follow the “of,” probability “of what”?!

 

WHAT YOU COULD USE MINITAB FOR

·         Numerical and graphical summaries of a distribution of data (e.g., histogram, interquartile range)

·         Calculate probabilities, cumulative probabilities for known probabilities distributions (e.g., binomial, normal, gamma, etc.)

·         Calculate confidence intervals for m, p

·         Perform a small simulation, including through a macro

 

SOME GENERAL EXAM PREPARATION ADVICE:

·         Be prepared to think/explain/interpret

·         Understand, don’t memorize

·         Don’t plan to rely heavily on the notes pages

·         Reread handouts, earlier exams, the text

·         Rework examples from class, homework exercises, review problems

·         Consider the big ideas from labs

·         Make sure you can read/use Minitab output

 

Notation, Acronyms:

E(X)

expected value of the random variable X (aka m)

V(X)

variance of the random variable X (aka s2)

Z

standard normal random variable; number of standard deviations from mean (X-m)/s

m

population mean; expected value of a random variable; mean of normal distribution

s

population standard deviation; standard deviation of a random variable; SD of normal

sample mean

s

sample standard deviation

sample proportion of successes

p

population proportion of successes; probability of success

q

generic unknown parameter

estimator of generic unknown parameter value

n

sample size (number of trials, number of observations recorded)

N

population size

q

1-p

s

Standard deviation of sample means SD()= s/ = SD(X)/ 

m

Mean of sample means, E() = m = E(X)

a, b

parameters of distribution, e.g., Weibull, Gamma

l

parameter of exponential, Poisson distributions

G

gamma function, see formulas page

F

cdf of standard normal distribution

rv

random variable

pmf p(x)

probability that a discrete random variable is equal to x

pdf f(x)

integrated to determine the probability for a continuous random variable over interval

cdf F(x)

cumulative distribution function, P(X<x) for continuous or discrete random variable