Stat 324 – Applied Regression Analysis

 

Instructor:      Dr. Beth Chance

Class Time:    M, T, W, R 2:10-3:00, Studio Classroom (02-206)

Office:             Faculty Office Building East 25-103

Phone:            756-2961 (×62961 on campus)

Email:             bchance@calpoly.edu   (a very good way to reach me)        

Office Hours: Mon 3-5 in 02-206, Tues 1:10-2, Wed 1:10-2pm, Thursday 12:10-1

by appointment; and anytime my door is open

 

Course Webpages:    Blackboard (http://my.calpoly.edu)                                                               

http://statweb.calpoly.edu/bchance/stat324S09/

Course Listserv:        stat-324-01-2094@calpoly.edu   

 

Prerequisite: Stat 252 or Stat 322 or Stat 313.  See me if you have not fulfilled this prerequisite.

 

Course Objective: To expose students to linear statistical models and show them real-world applications.  The focus will be on selecting the appropriate model, assessing the fit of the model, and effectively communicating statistical results.  Computer technology will be used to facilitate data analysis and explore concepts.

 

Texts/Materials:

Required:                                Introduction to Regression Modeling, Abraham and Ledolter, Cengage Publishing (2006)

                                                                                   

Access to Minitab (or other statistical software), internet

 

Strongly Recommended:        Applied Linear Regression Models, 4th edition

                                    Kutner, Nachtsheim, and Neter; McGraw Hill/Irwin Press (2004)

           

You should also have a USB drive, a scientific calculator, an email address, and a large three-ring binder.  You will need access to Minitab (or other statistical software) and the internet outside of class.  Additional lecture handouts will be supplied in class, you are responsible for receiving and keeping these materials. Handouts from previous lectures will be available outside my office door and on the course web pages.

 

Statistical Packages: We will be predominantly using the Minitab software package for data analysis and exploration (version 15).  You will be given instructions for how to use Minitab and Microsoft Word as needed for this course.  You will need access to a statistical analysis package like Minitab outside of class.  Minitab is freely available in the Library Learning Commons and you can also download a copy from the Technology panel of my.calpoly.edu (see handout).  You are also required to bring a scientific calculator with you to each class session.  There will be Open Lab Hours for the Statistics studio posted as well. You are free to use other packages, I just may not be able to help you.  I will also point you to some other packages and website possibilities (see green handout).  I may also demonstrate Fathom and DataDesk and ask you to use them in class, but will not ask you to use them outside of class.


Grading:         Class Participation/Examples: 5%                               5%

Homeworks: 20%                                                       20%

Term Project (3 parts): 25% (5%, 10%, 10%) 7.5%, 12.5%, 15% (35%)

            Midterms: 30% (15% each)                                        40% (15%, 25%)

Final: 20%

 

Class Participation/Examples: There will be numerous “discussion questions” assigned as well as examples that you complete in class and turn in your answers to me.  These will be graded, but predominantly in terms of “was seriously attempted” vs. “was not really attempted.”  You need to be prepared to discuss your analysis to a particular problem to the rest of the class. 

 

Term Project: There will be a data collection project due in several stages during the quarter in which you will be asked to apply the methods we learn this quarter and to produce word processed reports of your analyses.  Details will appear on the web pages.

 

Homeworks: Completed assignments are to be handed in at the beginning of the class period.  You are encouraged to work together on assignments but write up your solutions individually and hand in your own work. If I determine assignments are too similar, I will divide the grade among the individuals. You are also encouraged/expected to ask questions during and outside of class. You are expected to utilize the computer for much of your analysis and must INCLUDE all relevant output with your assignment for full credit. Graded assignments will be returned in class or can be picked up from me.  Late homeworks will NOT be accepted, but I will review previous assignments with you if you complete them after the deadline. Your lowest homework score will be dropped.  You are advised to not use this up too early in the quarter and to start your assignment early in the week. Homework solutions will be posted on the webpages. 

     The main requirement for all problems is that you EXPLAIN your answers. Often, questions may have more than one correct answer so several answers will be accepted as long as they are JUSTIFIED. You should also state any ASSUMPTIONS that you make. Soon you will be explaining your results to managers and people outside your discipline, so you need to get used to explaining and backing up the numbers in English! You’ll also be given partial credit for your work, so it is important to at least attempt each problem.

 

Exams: There will be two in class exams and one (optional) comprehensive final.  Graded exams will be returned in class or can be picked up from me.

Make-up Policy: Make-up oral exams will be given to students who notify me (with appropriate proof) at least two days before the exam of their unavoidable absence.

 

Classroom Culture:  The textbook will serve as a guide but I expect to supplement the material in the textbook extensively.  It will be important for you to come to class, to participate fully, to ask questions, and to be responsible for all course handouts.  This will allow us to focus more on the interpretation and presentation of statistical analyses.  The secondary text, Applied Linear Regression Models, will also be helpful to you in filling in some of the gaps.  I cannot overemphasize the importance for you to follow along with the reading assignments and to ask questions of me for any components that are not clear.  I hope to create a collaborative learning environment, where you feel comfortable asking questions and working together.  Still, I do ask that when I am lecturing that you give me your complete and undivided (or at least silent) attention.

Stat 324 - (Very) Rough Schedule

 

Lect

Day

Date

Reading

Topic

Assignments Due

1

M

3/30

Ch. 1

Introduction to Regression models

 

 

T

3/31

 

No Classes

 

2

W

4/1

2.1, 2.2

Correlation, Simple Linear Regression

 

3

R

4/2

2.3, 2.4

Least Squares Estimation

HW 1 (webpage)

4

M

4/6

2.5

The Regression Model

 

5

T

4/7

2.5, 2.6

Inference for Regression

 

6

W

4/8

2.7, 2.8

Inference for Regression (cont.)

 

7

R

4/9

2.9

Regression through the origin

HW 2

8

M

4/13

6.1, 6.2

Model Checking

 

9

T

4/14

6.4

Lack of Fit Tests

 

10

W

4/15

 

Transformations

 

11

R

4/16

6.5

Transformations (cont.)

HW 3

12

M

4/20

 

Calibration

 

13

T

4/21

 

Regression to the Mean

 

 

W

4/22

 

Review

Project 1

 

R

4/23

 

Exam 1

 

14

M

4/27

4.1

Multiple regression

 

15

T

4/28

4.3

Inference for Multiple Regression

 

16

W

4/29

4.4

Inference cont.

 

17

R

4/30

4.5

Coefficient of Determination

HW 4

18

M

5/4

5.4, 6.2

Model Diagnostics

 

19

T

5/5

6.3

Case Influence Statistics

 

20

W

5/6

 

continued

 

21

R

5/7

6.2.4

Durbin Watson Test

HW 5

22

M

5/11

5.1

Polynomial Regression

 

23

T

5/12

5.2

Indicator variables

 

24

W

5/13

 

Interactions

 

25

R

5/14

5.3

Comparing several treatments

HW 6

26

M

5/18

Ch. 7

Variable section

 

27

T

5/19

 

Model Building

 

 

W

5/20

 

Review

Project 2

 

R

5/21

 

Exam 2 (Ch. 3-10)

 

 

M

5/25

 

No Classes

 

28

T

5/26

Ch. 11

Logistic Regression

 

29

W

5/27

 

Logistic Regression cont

 

30

R

5/28

 

Logistic Regression cont

 

31

M

6/1

 

Logistic Regression cont

 

32

T

6/2

 

Presentations

HW 7

 

W

6/3

 

Review

 

 

R

6/4

 

No Class

Project 3

 

M

6/8

 

Final Exam, 1:10-4:00

 

Some Review Thoughts

 

·         The null and alternative hypotheses are always statements about population parameters.  You can think of the null hypothesis as the uninteresting case and the alternative as what you hope to show.  With a single parameter, my convention will be to always use an equal sign in the null hypothesis, even with a one-sided alternative.

·         Standard deviation and standard error are essentially equivalent terms.  We call it a standard error when we want to admit it (the standard deviation of the statistic) was estimated from sample data.

·         A test statistic measures the distance between what you observed in your sample and what you predicted, often in terms of number of standard deviations/standard errors apart.  Many test statistics have the form: test statistic = (observed – hypothesized)/std error

·         The p-value measures the probability of getting a test statistic at least this extreme when the null hypothesis is true by random chance alone.  If we assume the null hypothesis is true and we repeatedly draw samples from the model it specifies, the p-value measures how often would we get sample results at least this far from the null hypothesis.  Small p-values give us strong evidence against the null hypothesis.  If a level of significance is not specified, feel free to use a = .05.

If Ha is one-sided (< or >) we calculate the p-value as the probability below or above the test statistic, respectively.

If Ha is two-sided (¹) find the (smaller) tail probability and multiple by two to determine the p-value.      

Ha: parameter>hypothesized   Ha: parameter<hypothesized  Ha: parameter ¹ hypothesized

                                                        

To get Minitab to calculate cumulative probabilities for the t distribution, choose Graph > Probability Distribution Plot > View Probability > Distribution: t. Specify the degrees of freedom and select the Shaded Area tab.  Select the radio button for X value, select the appropriate tail and in the X value box enter t test statistic value. 

·         To obtain critical values for a confidence interval, select the radio button for Probability, select Both Tails and enter 1-C as the probability. A confidence interval tells you the range of plausible values for your parameter, based on what you observed in the sample. For example, I’m 95% confident that the population mean is in this interval that I have calculated (citing numerical values). The confidence level tells you how reliable the confidence interval method is – if we were to repeatedly draw samples and calculate intervals, what percentage do you expect to capture the population parameter.

·         A prediction interval is a much wider interval that takes the person to person variability into account and does give a range of plausible values for the next individual observation.