Stat 324 – Applied Regression Analysis
Instructor: Dr. Beth Chance
Class Time: M, T, W, R 2:10-3:00, Studio Classroom (02-206)
Office:
Phone: 756-2961 (×62961 on campus)
Email: bchance@calpoly.edu (a very good way to reach me)
Office Hours: Mon 3-5 in 02-206, Tues 1:10-2, Wed 1:10-2pm, Thursday 12:10-1
by appointment; and anytime my door is open
Course Webpages: Blackboard (http://my.calpoly.edu)
http://statweb.calpoly.edu/bchance/stat324S09/
Course Listserv: stat-324-01-2094@calpoly.edu
Prerequisite: Stat 252 or Stat 322 or Stat 313. See me if you have not fulfilled this prerequisite.
Course Objective: To expose students to linear statistical
models and show them real-world applications.
The focus will be on selecting the appropriate model, assessing the fit
of the model, and effectively communicating statistical results. Computer technology will be used to
facilitate data analysis and explore concepts.
Texts/Materials:
Required: Introduction to Regression Modeling, Abraham and Ledolter, Cengage Publishing (2006)
Access to Minitab (or other statistical software), internet
Strongly
Recommended: Applied
Linear Regression Models, 4th edition
Kutner, Nachtsheim,
and Neter; McGraw Hill/Irwin Press (2004)
You should also have a USB drive, a scientific calculator, an email address, and a large three-ring binder. You will need access to Minitab (or other statistical software) and the internet outside of class. Additional lecture handouts will be supplied in class, you are responsible for receiving and keeping these materials. Handouts from previous lectures will be available outside my office door and on the course web pages.
Statistical Packages: We
will be predominantly using the Minitab software
package for data analysis and exploration (version 15). You will be given instructions for how to use
Minitab and Microsoft Word as needed for this course. You will need access to a statistical
analysis package like Minitab outside of class.
Minitab is freely available in the Library Learning Commons and you can
also download a copy from the Technology panel of my.calpoly.edu (see handout). You are also required to bring a scientific
calculator with you to each class session.
There will be Open Lab Hours for the Statistics studio posted as well. You
are free to use other packages, I just may not be able to help you. I will also point you to some other packages
and website possibilities (see green handout).
I may also demonstrate Fathom and DataDesk and ask you to
use them in class, but will not ask you to use them outside of class.
Grading: Class Participation/Examples: 5% 5%
Homeworks: 20% 20%
Term Project (3 parts): 25% (5%, 10%, 10%) 7.5%, 12.5%, 15% (35%)
Midterms: 30% (15% each) 40% (15%, 25%)
Final: 20%
Class
Participation/Examples: There will
be numerous “discussion questions” assigned as well as examples that you
complete in class and turn in your answers to me. These will be graded, but predominantly in
terms of “was seriously attempted” vs. “was not really attempted.” You need to be prepared to discuss your
analysis to a particular problem to the rest of the class.
Term Project: There will be a data collection project due in several stages during the quarter in which you will be asked to apply the methods we learn this quarter and to produce word processed reports of your analyses. Details will appear on the web pages.
Homeworks: Completed assignments are to be handed
in at the beginning of the class
period. You are encouraged to work
together on assignments but write up your solutions individually and hand in
your own work. If I determine assignments are too similar, I will divide the
grade among the individuals. You are also encouraged/expected to ask questions
during and outside of class. You are
expected to utilize the computer for much of your analysis and must INCLUDE all
relevant output with your assignment for full credit. Graded assignments
will be returned in class or can be picked up from me. Late homeworks will NOT be accepted, but I
will review previous assignments with you if you complete them after the
deadline. Your lowest homework score will be dropped. You are advised to not use this up too early
in the quarter and to start your assignment early in the week. Homework solutions
will be posted on the webpages.
The main requirement for all problems is that you EXPLAIN your answers. Often, questions may have more than one correct answer so several answers will be accepted as long as they are JUSTIFIED. You should also state any ASSUMPTIONS that you make. Soon you will be explaining your results to managers and people outside your discipline, so you need to get used to explaining and backing up the numbers in English! You’ll also be given partial credit for your work, so it is important to at least attempt each problem.
Exams: There will be two in class exams and one (optional) comprehensive final. Graded exams will be returned in class or can be picked up from me.
Make-up Policy: Make-up oral exams will be given to students who notify me (with appropriate proof) at least two days before the exam of their unavoidable absence.
Classroom Culture: The textbook will serve as a guide but I expect to supplement the material in the textbook extensively. It will be important for you to come to class, to participate fully, to ask questions, and to be responsible for all course handouts. This will allow us to focus more on the interpretation and presentation of statistical analyses. The secondary text, Applied Linear Regression Models, will also be helpful to you in filling in some of the gaps. I cannot overemphasize the importance for you to follow along with the reading assignments and to ask questions of me for any components that are not clear. I hope to create a collaborative learning environment, where you feel comfortable asking questions and working together. Still, I do ask that when I am lecturing that you give me your complete and undivided (or at least silent) attention.
|
Lect |
Day |
Date |
|
Topic |
Assignments Due |
|
1 |
M |
3/30 |
|
Introduction to Regression models |
|
|
|
T |
3/31 |
|
No Classes |
|
|
2 |
W |
4/1 |
2.1, 2.2 |
Correlation, Simple Linear Regression |
|
|
3 |
R |
4/2 |
2.3, 2.4 |
Least Squares Estimation |
HW 1 (webpage) |
|
4 |
M |
4/6 |
2.5 |
The Regression Model |
|
|
5 |
T |
4/7 |
2.5, 2.6 |
Inference for Regression |
|
|
6 |
W |
4/8 |
2.7, 2.8 |
Inference for Regression (cont.) |
|
|
7 |
R |
4/9 |
2.9 |
Regression through the origin |
HW 2 |
|
8 |
M |
4/13 |
6.1, 6.2 |
Model Checking |
|
|
9 |
T |
4/14 |
6.4 |
Lack of Fit Tests |
|
|
10 |
W |
4/15 |
|
Transformations |
|
|
11 |
R |
4/16 |
6.5 |
Transformations (cont.) |
HW 3 |
|
12 |
M |
4/20 |
|
Calibration |
|
|
13 |
T |
4/21 |
|
Regression to the Mean |
|
|
|
W |
4/22 |
|
Review |
Project 1 |
|
|
R |
4/23 |
|
Exam 1 |
|
|
14 |
M |
4/27 |
4.1 |
Multiple regression |
|
|
15 |
T |
4/28 |
4.3 |
Inference for Multiple Regression |
|
|
16 |
W |
4/29 |
4.4 |
Inference cont. |
|
|
17 |
R |
4/30 |
4.5 |
Coefficient of Determination |
HW 4 |
|
18 |
M |
5/4 |
5.4, 6.2 |
Model Diagnostics |
|
|
19 |
T |
5/5 |
6.3 |
Case Influence Statistics |
|
|
20 |
W |
5/6 |
|
continued |
|
|
21 |
R |
5/7 |
6.2.4 |
Durbin Watson Test |
HW 5 |
|
22 |
M |
5/11 |
5.1 |
Polynomial Regression |
|
|
23 |
T |
5/12 |
5.2 |
Indicator variables |
|
|
24 |
W |
5/13 |
|
Interactions |
|
|
25 |
R |
5/14 |
5.3 |
Comparing several treatments |
HW 6 |
|
26 |
M |
5/18 |
Ch. 7 |
Variable section |
|
|
27 |
T |
5/19 |
|
Model Building |
|
|
|
W |
5/20 |
|
Review |
Project 2 |
|
|
R |
5/21 |
|
Exam 2 (Ch. 3-10) |
|
|
|
M |
5/25 |
|
No Classes |
|
|
28 |
T |
5/26 |
|
Logistic Regression |
|
|
29 |
W |
5/27 |
|
Logistic Regression cont |
|
|
30 |
R |
5/28 |
|
Logistic Regression cont |
|
|
31 |
M |
6/1 |
|
Logistic
Regression cont |
|
|
32 |
T |
6/2 |
|
Presentations |
HW 7 |
|
|
W |
6/3 |
|
Review |
|
|
|
R |
6/4 |
|
No Class |
Project 3 |
|
|
M |
6/8 |
|
Final Exam,
1:10-4:00 |
|
Some Review
Thoughts
· The null and alternative hypotheses are always statements about population parameters. You can think of the null hypothesis as the uninteresting case and the alternative as what you hope to show. With a single parameter, my convention will be to always use an equal sign in the null hypothesis, even with a one-sided alternative.
· Standard deviation and standard error are essentially equivalent terms. We call it a standard error when we want to admit it (the standard deviation of the statistic) was estimated from sample data.
· A test statistic measures the distance between what you observed in your sample and what you predicted, often in terms of number of standard deviations/standard errors apart. Many test statistics have the form: test statistic = (observed – hypothesized)/std error
· The p-value measures the probability of getting a test statistic at least this extreme when the null hypothesis is true by random chance alone. If we assume the null hypothesis is true and we repeatedly draw samples from the model it specifies, the p-value measures how often would we get sample results at least this far from the null hypothesis. Small p-values give us strong evidence against the null hypothesis. If a level of significance is not specified, feel free to use a = .05.
If Ha is one-sided (< or >) we calculate the p-value as the probability below or above the test statistic, respectively.
If Ha is two-sided (¹) find the (smaller) tail probability and multiple by
two to determine the p-value.
Ha: parameter>hypothesized Ha: parameter<hypothesized Ha: parameter ¹ hypothesized
To get Minitab to calculate cumulative probabilities for the t distribution, choose Graph > Probability Distribution Plot > View Probability > Distribution: t. Specify the degrees of freedom and select the Shaded Area tab. Select the radio button for X value, select the appropriate tail and in the X value box enter t test statistic value.
· To obtain critical values for a confidence interval, select the radio button for Probability, select Both Tails and enter 1-C as the probability. A confidence interval tells you the range of plausible values for your parameter, based on what you observed in the sample. For example, I’m 95% confident that the population mean is in this interval that I have calculated (citing numerical values). The confidence level tells you how reliable the confidence interval method is – if we were to repeatedly draw samples and calculate intervals, what percentage do you expect to capture the population parameter.
· A prediction interval is a much wider interval that takes the person to person variability into account and does give a range of plausible values for the next individual observation.