| Question | Answer (If your question wasn't answered it was just "too large" and we should talk in person and at review session?) |
| I would like to have a review of what tests to use with different types of data, and why we use a certain type of test over another | See the Overivew of Statistical Procedures handout? The main things to consider is how many variables you have, have many samples, and whether the variables are quantitative or qualitative |
| I would like to learn more about when and how to use multiple regression. | Multiple regression is an extension of simple linear regression, where we just add on more explanatory variables. There are entire courses in regression, including Stat 513 :) |
| 1. I would like to go over the fundamentals of a regression equation again. | |
| 1) I'm still having problems understanding how to read and interpret the output from regression data. For example, how do you know when to divide by 2 for the p value, and how do you properly interpret least squares line equation? | The computer output you have seen always reports a two-sided p-value so if the alternative was specified as one-sided (looking for evidence of either a positive or a negative relationship instead of just a relationship), you will take the p-value given by the computer and divide it by two. |
| 2) Can you explain what residual plots tells us about the data? | They tell us if the data met the conditions necessary for the inferential procedure we have learned for regression to be valid. Residuals vs. fitted values gives us information about whether y and x are linearly related and about whether the SD of the responses varies with x. The histogram of the residuals tells us whether the distribution y at each x can be considered normal. |
| 3) Problem 5 from HW 8 confused me on how to set up the regression equation, and how to properly interpret each coefficient and how to understand the SAS output. | See the new "Commentary on HW 8" link online. |
| 4) Can you go over multicollinearity again please? | All I want you to realize at this stage is that if a multiple regression model contains explanatory variables that are highly related to each other, this can cause problems in the estimated slopes. In such a case, we should eliminate the "redundant" variables from the model. |
| 5) What is a good indicator if a sample is a independent random sample? | Can't have one independent random sample, need at least 2. A good indication that samples are independent is that I tell you they are. Another indication is that the samples are literally selected at different places and or times, for example one in 1998 and one in 2002. Or the problem might say "we took a sample of men and then we took a separate sample of women". Often, if the groups really are separate from each other, like men and women, we are willing to treat them as independent. |
| 6) What does the SSE tell us or how is it important for the regression line? | Remember the "least squares regression line" applet where we looked at different lines and at measures of how far the points were from the lines by looking at the squared veritical distances between the points and the line. SSE is the sum of these squared distances. Regression lines with smaller SSE values are doing a better job of predicting the observations (smaller residual values overall). |
| An explanation between using a t-distubution and t-interval. I am not sure when to use either test. | When you have just one sample and just one quantitative variable, inference procedures will be carried out using the t distribution. A t-test compares the sample mean to a conjectured value for the population mean (numerator: xbar - mu_0). If we just want to estimate the population mean "mu" we use a t-interval. |
| Could you briefly review the meaning of each of the different symbols used this quarter? | See the last page of the Final Review Handout. Ask me about specific symbols that are still unclear? |
| When is it appropriate to use General Line Method for ANOVA? | Always, but especially if you have an "unbalanced" design |
| when you're doing a one-way ANOVA, when do you want to include Tukey's comparison? | The best answer is when the overall F test is statistically significant and you have more than 2 groups. |
| 1.Chi-square test: when stating hypotheses corresonding to how data were collected, what is the big difference between proportion and distribution? | If the variable is binary and I tell you the proportion of successes, that automatically tells you the proportion of failures. If we have more than 2 outcomes, then I want to describe the distribution across all the categories instead of just focusing on one category. |
| 2. How to interpret Tukey's Multiple Comparison (in context). | Each interval is for a pair of popuation means. Any interval that does not contain zero says those two population means are different. |
| 3.With ANOVA, the benefit of reducing the "error" component. | Think about the situation where we have multiple boxplots and they overlap a lot because they are so wide. Even if the means of the samples are quite different, you will probably still fail to reject Ho since there is so much overlap. That overlap is coming from "unexplained variabilty" is measured across the groups through SSE. If we can "explain" some of the variabilty, then the SSE term will go down and consequently the denominator of the F statistic goes down, increasing the value of F and giving us a more significant result. |
| When can we not use regression since it was stated last class time it could be used for almost anything in testing statistical methods. | To be honest, I am claiming that, if the technical conditions are met, you can use regression for anything! However, I don't expect you to know how and you should stick to the "Overview of Statistical Procedures" handout to decide which procedure to use. |
| When to use different tests of significance? How to interpret some of the minitab ourputs especially for regression and multiple regression analysis. | See the Overivew of Statistical Procedures handout? Review the Lecture examples? |
| Can you explain using Anova to analyze between group variability as oppposed to within group variability? Can you give an example? | Go back to the "ANOVA simulation" applet, see how the p-value is affected by how different the population means are (between group variabiltiy) and how wide the boxes are (within group variabiltiy)? |
| How do you tell the difference between when you can use an ANOVA and when you can use a chi-squared. | ANOVA is a quantitative response (so one of each) and chi-square is both qualtitative. |
| What is the difference between a t-interval and a t-test? | When you have just one sample and just one quantitative variable, inference procedures will be carried out using the t distribution. A t-test compares the sample mean to a conjectured value for the population mean (numerator: xbar - mu_0). If we just want to estimate the population mean "mu" we use a t-interval. |
| Please go over regression and multiple regression again. Thanks. | Please review lecture notes and ask more specific question? |
| Summary of all analysis methods covered this quarter...application, technical conditions, interpretation...the works! A handout would be ideal! | It's called the Overview of Statistical Procedures handout :) |
| Today you mentioned how regression is just a model of all the test we have done so far (or something like that). I would like it if you could explain this a bit more so I can see how it is connected to the other tests, and have a better understanding of regression itself. | It's really a little beyond this course… |
| Please re-explain what exactly the chi-squared test is measuring, and how one interprets the results. | It is measuring the significance of the association between two qualitative variables. If the p-value is small, we reject the null hypothesis of no association and then we can look at the "contributions to the chi-square sum" to see which cells have the biggest descrepencies between the observed and the expected counts (expected if no association between the variables). |
| I am still fuzzy on Chi Squared procedures. How to properly set up tables to evaluate data, the whole RV/EV which goes in rows/column. | We have generally been putting the EV as the column variable. Just make sure that the rows represent outcomes of one variable and the columns represent outcomes of the second variable. |
| Could you please give a quick review of the minitab steps for regression, it seems as if there are multiple pathways to obtain the same goal. | There are several ways to do the same thing: Stat > Regression > Regression and Stat > Regression > Fitted Line Plot are probably the best. The latter gives the pretty picture, the former gives more of the regression output details (individual t-tests). |
| Review the one-way and two-way ANOVA procedures. I am still not very clear about how to carry out these processes. Also, I'm not clear about what blocking means in a two-way ANOVA (does it mean to hold one variable constant?). | ANOVA arises when we want to compare multiple means. We may have just one qualitative explanatory variable (one-way anova) or two (two-way anova) or more. It just depends how many variables you enter into the "model" box when you run the ANOVA command. Often, the second variable could be considered a "blocking variable" if all subjects in the block were randomly assigned among all the treatments within a block (or the same units have multiple treatments). |
| Can you go over part (xiii) of problem 14 (pg. 428). I believe it was the one sided test of negative slope. | yes, so since the sample slope was negative, to find the one-sided p-value, just divide the given two-sided p-value by . |
| When comparing two responses from one person which tests can we use? I think this is referred to the matched pair or randomized block design....I noticed in the chocolate chip example we used a two way ANOVA and in the practice problems we used a one sample t? | We need to include "subject" as a variable in the analysis. If there is more than one treatment (like with the chips), we need ANOVA. If there are only two treatments, then the ANOVA is equivalent to a one-sample t test on the differences. |
| 1) How do we decide when we should use Turkey test and two-sample t test to fine the confidence interval because I feel that these two tests get the same results? | A two-sample t interval is used when we just have two population means (two groups to compare). If we have more than 2 groups but want to look at pairwise comparisons (all the possible two group comparisons) then we use Tukey's as it adjusts the confidence level for each individual CI so that the overall confidence level is the claimed 95% percentage (for example). If there are only two groups, then they are essentially the same and Tukey's is not necessary. |
| 2) How do we determine generalizability and what does it mean? | Generalizability is what I call a willingness to take the results from the sample and apply them to the population in general (e.g., more people in the sample prefered Coke over Pepsi so I want to conclude that is true of all Cal Poly students as well). We will be willing to make such generalizations when the sample is representative of the population and the best way to convince other that that is true is for the sample to have been randomly selected from the population. |
| 3) I was having a difficult time putting the data into minitab in the homework#6. Are we going to have to do that in the final? if so, could you explain what to look at and the fastest way to put on data? | It is possible I could give you a very small data set to input. In that case, it's probably fastest just to type it in. |
| 4) I don't know how to state the alternative hypothesis on regression. I know that I should look at the question and the question should tell me what to do....but I really have no clue when I look at the question. Is anything in particular to look for? | In general, the alternative is that there is an association. All you really need to get out of the question is whether they want "an association" or if they are looking specifically for a positive or for a negative association. This will also correspond to beta = 0, beta > 0, or beta < 0. |
| Can you please go over the ENTIRE #5 from the last homework?? | See the new "Commentary on HW 8" link online. |
| What does constant refer to in the regression output? | The intercept term. |
| I would like you to adress a little more about Tukey's multiple comparison procedure. Also, I know that you already adressed this in class, but maybe you can spend a little more time talking about the differences between a test and an interval. | test: I have a specific yes/no question in mind (more than half, less than 10, the two parameters differ). Interval, I just want an estimate of the parameter (what proportion prefer coke, what is the average reading time, how much more do men spend than women on average) |
| What does the least square regression line and R-sq say about? When to use two-way ANOVA? | R-sq is the proportion of variabilty in the response explained by the line. Two-way ANOVA arises when we have a quantitative response and two different explanatory varaibles. |
| Could you explain once again what blocking variables are and how they affect the design of a study? | If before we randomize subjects to treatments we first split them into homogenous groups (like twins or married couples) and then we will essentially make our comparisons within these groups. Your prototypical example to remember here is the chip study we did in class - each person served as a block and then the order of the treatments was randomly assigned separately within each block. The analysis then takes into account the person and made the melting time comparisons within each person more than across the different peropld. |
| Also, coud you explain the difference between homogeniety of proportions and independence? | In chi-square, if we have separate populations (e.g., separate judges, separate years) and we have measured a qualittative variable for each random sample from each population, then the test is comparing homogeneity of proportions/distributions. If we have just one sample, but we have recorded two qualtitative variables on each member of the sample, we are testing for association/lack of independence between the variables. |
| I'm still confused about which procedure to use, especially when there are multiple EVs. I don't feel comfortable restacking the data to do a Chi. I'm not sure how to reprort the Ho and Ha or conclusion when we're working with logs (this may be answered with the final HW though). I'm not sure what all of the test statistics actually tell us - r-sq, t, z... | Multiple Evs only come up with ANOVA or Regression. Don't worry too much about logs. |
| a) Could we please go over chi-square again (technical conditions, when to use it, ect) | Use a chi-square procedure when you have two qualitative variables. The main TC is that the expected cell counts all exceed 5. The randomness condition varies slightly depending on whether you are considering the data as arising from one population with 2 variables or if you really treat one of the variables as indicative of 2 or more population and you have a random sample from each population. |
| I would really like you to explain, again, how to interpret (statistically and inferentially) regression analyses. Show us again how to do all of the basic functions on the computer as far as all the different features we have utilized in regards to regression. Briefly review the slope interactions - how to interpret (statistically and inferentially) slope interaction analyses. Show us again how to do all of the basic functions on the computer | One note, we didn't discuss how to work with "interactions" in regression. It's possible, but we didn't discuss it. Mostly you need to know how to use Stat > Regression > Regression or Stat > Regression > Fitted Line Plot. |
| I would like it if you can give some more examples as to the different types of test we should use such as our practice problem. I had trouble with that practice problem so if you can present some examples of each test that would be helpful. | also wait a few days and retest yourself on that particular practice problem? |
| when do you use use r-squared and when do you use adjusted r-squared? for the chi-square procedure what is the difference between the 3 cases for use (lecture 14), and how does this change the conclusion? please explain the randomized block design some more, it confuses me. | use r-sq with simple linear regression to learn what percentage of the variation in the repsonse variable is explained by the regression on the explanatory variable. Use r-sq(adj) in multiple regression when you want to decide that one model (with some collection of EVs) is doing better than another model (with a different collection of EVs). |
| What does LINE stand for? | The technical conditions in regression: L = linearity, I = independence of observations, N = normality, E = equal variance |
| instead of a question, i would like for you to go over when we need to use what statistical test given the type/amount of the variables. i am very confused as to when we need to use one way anova, general model anova, chi square, and other methods from the latter part of the course. | See also the Overview of Statistical Procedures Handout. The only one not on there is two-way anova. The "general linear model" feature in Minitab is a way of running either one or two-way anovas. It's a little more flexible and will sometimes produce output when the more basic one-way anova and two-way anova functions cannot. |
| I think the biggest problem i have is explaining the results of a given test. I would like to see what exactly needs to be addresses, what not to say, and possibly with specific examples | what needs to be addressed: is the result statistically significant, what population are you willing to apply the results to, are you willing to draw a cause and effect conclusion. Each of these statements should be supported by the appropriate evidence (was the p-value small, did we have random sample(s), did we have random assignment). |
| Can you please go over what the F-statistic means? Can you explain the what the individual components are of the regression fit equation. In other words we have predicted cost (for example)= something what are the somethings in words? | The F statistic is a ratio that measures how far apart the sample means are compared to how much "random variation" there is in the data. A regression equation is y-hat = a + b x where a is the intercept, b is the slope and x is the explanatory variable. |
| Can you please review how to interpret both slope coefficients and the intercept coefficient in words. | intercept: the predicted response when EV(s) = 0; slope: the predicted change in the response if the EV increases by one unit (possibly holding other variables constant if multiple regression) |
| When and how do we use Tukey's method. | When have more than 2 groups in ANOVA and want to follow-up by looking at all possible pairs of means to see which are significantly different. |
| What axis do the Ev and Rv belong to, X or Y. | EV = x = horizontal axis |
| What would you recommend as the best method to study for the final? E.g., just review the HW assignments and the final review you gave out, answering the questions and figuring out the ones you don't know by asking you? | Be comfortable with the homework problems. Compare your answers to the online solutions. Work problems from scratch and ask questions. |
| Please explain the process of predictor equations, when to use log, reading regession symbols, and interpreting r and r squared. | Only use logs if I tell you to (and realize I only will as a way to fix technical conditions for regression). R is a measure of the strength of a linear relationship. R^2 is the proportion of variabilty in the response explained by the regression model. |