Stat 324 – HW 1
Due Thursday, April 2
This homework is due at the beginning of class Thursday but I will probably be willing to grant no penalty extensions to Friday this week. This does not mean wait until Wednesday night to start J Data can be accessed from statweb.calpoly.edu/bchance/stat324S09/data/. Once you have installed Minitab, just click on the file name in this directory or on the embedded links below to launch Minitab and open the data file.
You should clearly label all relevant computer output and
include it with your explanations. I
encourage you to always replicate at
least some portion the problem statement before each problem to aid your later
review.
·
If you plan to turn in a hard copy
Please write on only the front side
of each page and always staple all your pages together.
·
If you plan to submit an electronic copy
Email it to me as an attachment and make sure
your name is clearly indicated in your email, in the file, and in the file
name. You should use the subject line:
Stat 324 HW submission.
1) Log into the 324 course page in Blackboard and complete the “Stat 324 Questionnaire” test (under the Assignments tab). Note, there are no right or wrong answers, this is just data collection for me.
2) Make sure you have access to Minitab for the entire quarter. My recommendation is to download a free version of Minitab 15 from Cal Poly. See written instructions at here and/or the video tutorial at here.
If you are a Macintosh user, you can use Minitab in any of the IT labs on campus or you can install Minitab if you have a PC emulator (instructions). Also make sure you have a mechanism (e.g., USB drive) for transferring Minitab files you start in the Studio to your home computer.
3) Read Ch. 1 for background
To turn in:
4) Read the article at statweb.calpoly.edu/bchance/stat324S09/hw/sowell.pdf. Suggest two situations from your own life experience where there was an association between variables but not necessarily a cause and effect relationship. (Clearly specify the two variables and suggest an alternative explanation for the relationship.)
5) Suppose we only had the 12 airfares from San Luis Obispo, y1, …, y12
|
y1=288 |
y2=227 |
y3=313 |
y4=194 |
y5=370 |
y6=278 |
|
y7=348 |
y8=278 |
y9=414 |
y10=253 |
y11=298 |
y12=249 |
We suggested the mean or the median as a reasonable estimate of how much you expect to pay.
Consider a single value, call it
,
that we will use as our estimate. (So potentially,
= 292.5, that is,
,
or
= 283 (the median) or
is some other value entirely.) How can we
decide which one value of
is the best value? Well, for one, it depends what we mean by
“best.”
(a) Suppose we want to minimize
. Find the value of
that minimizes this expression (in absolute
value). Hint: Can we make this
expression equal to zero? For what value
of
? Set the expression to zero and solve for
to find a general solution (you reed to plug
in the above values…).
Hint: Be careful
with the notation here,
is a single value and yi refers to the ith
value.
(b) Suppose we want to minimize
.
Find the value(s) of
that minimizes this expression. Hint:
Open the Excel file: airfare.xls. This
formula has been entered into C1, to be evaluated for different candidate
values of
in column B.
Drag the formula from C1 down to the end (either by hand or click on it once
and then double click on the lower right corner of the cell. Now look through these sums to find the
smallest value. What is the minimum
value of the sum? For what value(s) of
?
How does your answer relate to the mean or the median of the airfares?
(c) Suppose we want to minimize
. Find the value of
that minimizes this expression. Hint:
Use calculus and differentiate with respect to
and/or create another column in Excel. Provide all
of the details of your approach. What is the minimum value? For what value(s)
of
?
How does your answer relate to the mean or median of the airfares?
6) Some have cited “Drive for show, putt for dough” as the oldest cliché in golf. The message is that the best way to improve one’s scoring average in golf is to focus on improving putting, as opposed to, say, distance off the initial drive, even though the latter usually garners more ooh’s and aah’s. To see if this philosophy has merit, we need to examine whether there is a relationship between putting ability and overall scoring, and whether that relationship is stronger than the relationship between scoring average and driving distance. The file golfers.mtw contains the 2004 statistics (through the Honda Classic on March 20) on the top 80 PGA golfers, downloaded from http://www.pgatour.com/stats/index on March 20, 2004. Three of the variables recorded include:
· Scoring average: A weighted average which takes the stroke average of the field into account. It is computed by adding a player’s total strokes to an adjustment, and dividing by the total rounds played. This average is subtracted from par to create an adjustment for each round. Keep in mind that in golf low scores, as measured by number of strokes, are better than high scores.
· Driving distance: Average number of yards per measured drive. These drives are measured on two holes per round, carefully selected to face in opposite directions to counteract the effects of wind. Drives are measured to the point where they come to rest, regardless of whether or not they hit the fairway.
· Putting average: On holes where the green is hit in regulation, the total number of putts is divided by the total holes played.
(a) Do you expect the relationship between scoring average and driving distance to be positive or negative? Explain.
(b) Do you expect the relationship between scoring average and putting average to be positive or negative? Explain.
(c) Open golfers.mtw and examine a scatterplot of average score (c2) vs. driving distance (c9) and average score vs. average putts (c10). Describe each scatterplot (direction, form, and strength). Do the relationships confirm your expectations in (a) and (b)? Does one relationship appear to be stronger than the other? If so, which?
(d) Calculate the correlation coefficients for each scatterplot. (In Minitab, choose Stat > Basic Statistics > Correlation, specify both variables in the Variables box, you can uncheck Display p-values for now, and press OK.) According to the correlation coefficients, which relationship is stronger? Remember to provide the supporting evidence for your conclusion.
Follow this link for more review of Correlation Coefficient.
7) The data in mitchell.mtw (Weisberg, 2005) gives average soil temperature in degrees C at 20 cm depth in Mitchell, Nebraska, for 17 years beginning January 1976.
(a) Use Minitab create a scatterplot by choosing Graph > Scatterplot (Simple), entering Temp as the Y variable and Month as the X variable. What does the graph reveal about the dependence of soil temperature on month number (direction, form, and strength)?
(b) Now use the mouse to pull the lower right corner of the graph window to the right so the window spans your screen. Change the scaling of the scatterplot by double clicking on the white space surrounding the plot. This should open the Edit Graph and Figure Regions menu (or Right click outside the plot and select Edit Figure Region). Select the Graph Size tab and then under True Size, select Custom, and change the width from 6 to 16. Do you learn any new information from this graph?
In general for HWs
· Document everything
· Label everything
· Support any statement you make.
· Be ready to interpret everything, including attaching measurement units.
· With a test of significance, always state the null and alternative hypothesis, the test statistic, the p-value, whether you consider the p-value large or small, and your conclusion in context. Ideally you will also define in words any symbols that you use in Ho and Ha. It’s also wonderful to include a probability distribution plot displaying the p-value, especially when you are not just using a p-value from Minitab’s regression output. It should also be very clear whether you are reporting a one-sided or two-sided p-value.
· In any “by hand” computations, state the formula, show the values plugged in, and display any in-between calculations (so I can see where and when you round).