Stat 321 Exam
3 November 18, 2004
Please write clearly on your own paper. You may use a calculator, notes, and
text. You have 50 minutes to complete
the exam, except that you may choose any
one of these problems to submit by
For questions that call for calculations, present your
method of solution in a clear, well-labeled manner and show the details of your
calculations. For questions that ask for
interpretations and explanations, explain your answers fully unless instructed
otherwise.
1. (18 pts) Suppose that the wrapper of a certain candy bar
lists its weight as 2.13 ounces. Suppose
that the actual weights of these candy bars vary according to a normal
distribution with mean m = 2.2 ounces
and standard deviation s = .06 ounces.
a) (6 pts) What proportion of candy bars weigh less than the advertised weight?
The z-score is (2.13-2.2)/.06 = -1.17. The table of standard normal probabilities
reveals the probability of weighing less than the advertised amount to be
.1210.
When the standard deviation is s = .04
ounces, the z-score is (2.13-2.2)/.04 = -1.75 and the probability is .0401.
b) (6 pts) If the manufacturer wants to adjust the production process so that only 1 candy bar in 1000 weighs less than the advertised weight, what should the mean of the actual weights be (assuming that the standard deviation of the weights remains .06 ounces)?
The table of standard normal
probabilities reveals that the z-score should be -3.10 (or thereabouts) for
this probability to equal .001, so we need (2.13-mu)/.06 to equal -3.10, which
gives mu = 2.13+3.10(.06) = 2.316.
When the standard deviation is s = .04
ounces, this becomes mu = 2.13+3.10(.04) = 2.254.
c) (6 pts) In a random sample of 5 candy bars, what is the probability that at least one of the candy bars weighs less than the advertised weight?
The probability that one randomly
selected candy bar weighs less than the advertised weight is .1210 (from part
a), so the probability that one randomly selected candy bar does not weigh less
than the advertised weight is 1-.1210 = .8790.
The probability that all five will not weigh less than the advertised
weight is therefore (.8790)^5, so the probability that at least one of the five
candy bars will weigh less than the advertised amount is 1-(.8790)^5, or about
.4753.
When the standard deviation is s = .04
ounces, this becomes 1-(1-.0401)^5, or about .1851.
2. (14 pts) Suppose that three firms (A, B, C) are competing
for two construction contracts. Let the
random variable X be the number of contracts awarded to firm A, and let the
random variable Y be the number of contracts awarded to firm B. The joint probability mass function (pmf) of
X and Y is given in the following table:
|
p(x,y) |
y=0 |
y=1 |
Y=2 |
|
x=0 |
1/9 |
2/9 |
1/9 |
|
x=1 |
2/9 |
2/9 |
0 |
|
x=2 |
1/9 |
0 |
0 |
a) (4 pts) Determine the probability that firm A and firm B receive the same number of contracts.
P(X=Y) = p(0,0) + p(1,1) + p(2,2) =
1/9 + 2/9 + 0 = 3/9 = 1/3
b) (4 pts) Do you expect the covariance between X and Y to be positive, negative, or zero? Provide a brief intuitive explanation, without performing any calculations.
Because a large value of X makes a
large value of Y less likely (for example, knowing that X=2 tells us for sure
that Y=0), the covariance between X and Y should be negative.
c) (6 pts) Calculate the covariance between X and Y.
E(XY) = (0)(0)(1/9) + (0)(1)(2/9) +
(0)(2)(0) + (1)(0)(2/9) + (1)(1)(2/9) + (1)(2)(0) + (2)(0)(1/9) + (2)(1)(0) +
(0)(2)(0) = 2/9.
To find E(X), note that P(X=0)=4/9,
P(X=1)=4/9, and P(X=2)=1/9. Thus, E(X) =
(0)(4/9) + (1)(4/9) + (2)(1/9) = 2/3.
Similarly, E(Y) = 2/3.
Thus, Cov(X,Y) = E(XY)-E(X)E(Y) =
2/9 –(2/3)(2/3) = -2/9.
3. (12 pts) Suppose that scores on a midterm exam follow a normal distribution with mean 70 and standard deviation 8, while scores on a final exam follow an independent normal distribution with mean 60 and standard deviation 12. Suppose also that the instructor weights the final to count twice as much as the midterm. Course grades are therefore based on the random variable T = M + 2F, where T represents the total course score, M represents the midterm exam score, and F represents the final exam score.
a) (6 pts) Specify the probability distribution of T.
Because T is a linear combination of
two normal random variables, T will have a normal distribution. Its mean is E(T) = E(M)+2E(F) = 70+2(60) =
190. Its variance is V(T) = V(M) + 2^2
V(F) = 8^2 + 4*12^2 = 640, so SD(T) = sqrt(640), which is about 25.30.
When E(F)=55 and SD(M)=6, we get
E(T) = E(M)+2E(F) = 70+2(55) = 180 and V(T) = V(M) + 2^2 V(F) = 6^2 + 4*12^2 =
612, so SD(T) = sqrt(612), which is about 24.74.
b) (6 pts) Determine the probability that a student scores above 200 for his/her total course score.
P(T>200) =
P(Z>(200-190)/25.30) = P(Z>0.40) = 1-.6554 = .3446.
When E(F)=55 and SD(M)=6, we get P(T>200)
= P(Z>(200-180)/24.74) = P(Z>0.81) = 1-.7910 = .2090.
4. (18 pts) Suppose that the lifetime of a light bulb
follows an exponential distribution with mean 3000 hours.
a) (6 pts) What is the probability that a light bulb will last for less than 2750 hours?
The cdf of an exponential
distribution with parameter lambda is F(x) = 1-exp(-x*lambda), so P(X<2750)
= F(2750) = 1-exp(-2750/3000) = 1-exp(-11/12), or about .6001. Note that we know that lambda = 1/3000
because E(X)=1/lambda.
b) (8 pts) In a random sample of 60 light bulbs, what is the
(approximate) probability that the sample mean lifetime will be less than 2750
hours?
With a sample size of n=60, the
Central Limit Theorem establishes that the sample mean X-bar will follow
(approximately) a normal distribution, with mean 3000 hours and standard
deviation sigma/sqrt(n) = 3000/sqrt(60) = 387.3 hours. Thus, P(X-bar<2750) =
P(Z<(2750-3000)/387.3)) = P(Z<-0.65) = .2578. Note that we know that sigma=3000 because the
variance of an exponential distribution is 1/lambda^2, so the std dev is
1/lambda, which is the same as its mean.
With a sample size of n=40, the
Central Limit Theorem establishes that the sample mean X-bar will follow
(approximately) a normal distribution, with mean 3000 hours and standard
deviation sigma/sqrt(n) = 3000/sqrt(40) = 474.3 hours. Thus, P(X-bar<2750) = P(Z<(2750-3000)/474.3))
= P(Z<-0.53) = .2981.
c) (4 pts) Would the probability in b) be larger, smaller,
or the same with a random sample of size 150?
Explain briefly, without performing any calculations.
The probability would be smaller. The larger sample size would mean that X-bar
would have less variation about mu (a smaller std dev), so there would be a
greater probability that X-bar would fall close to mu (3000) and a smaller
probability that X-bars would fall below 2750.
5. (18 pts) The midrange of a set of data is defined
to be the average (mean) of the minimum and maximum values in the data. The midhinge is defined to be the
average of the lower quartile (lower fourth) and the upper quartile (upper
fourth). Consider the following data, which
are the calorie contents of 20 beef hot dogs:
111 131 132 135 139 141 148 149 149 152
153 157 158 175 176 181 184 186 190 190
a) (6 pts) Calculate the midrange and midhinge for these data.
The midrange is (111+190)/2 = 150.5.
The lower quartile is the average of
the 5th and 6th ordered values: (139+141)/2 = 140, and
the upper quartile is the average of the 15th and 16th ordered
values: (176+181)/2 = 178.5, so the midhinge is (140+178.5)/2 = 159.25.
b) (6 pts) Are these statistics (midhinge and midrange) measures of center or measures of spread? [Indicate one or the other- center or spread.] Explain your answer.
These are both measures of center,
as they produce estimates of where the data are located, not how spread out the
data are.
c) (6 pts) Which (or possibly both, or possibly neither) would you expect to be resistant to outliers? Explain briefly.
The midhinge is resistant to
outliers, because it is based on the quartiles.
A few extreme values do not typically affect quartiles, unless the
sample size is quite small. On the other
hand, the midrange considers only the smallest and largest values, so it is
extremely susceptible to outliers and therefore not resistant.
6. (20 pts) In a recent study, researchers purchased 40 food
items in New York City and determined the actual calorie content of each
through a laboratory analysis. They then
calculated the percentage difference between the actual calorie content and the
calorie count listed on the item’s label.
(A positive percentage difference corresponds to a food item whose
actual calorie content was higher than what the label claimed.) Each food item was also classified according
to whether it was marketed locally, nationally, or regionally. The boxplots below were constructed to
compare the distributions:

Indicate (with a simple “yes” or “no”) whether it is valid to conclude that the locally marketed food items:
a) (3 pts) have the largest median of percentage differences among the three groups of items
Yes
b) (3 pts) have the largest IQR of percentage differences among the three groups of items?
Yes
c) (3 pts) have the largest lower quartile (lower fourth) of percentage differences among the three groups of items?
No (it’s hard to tell which has the
smaller lower quartile between the local and regional groups)
d) (3 pts) have the largest sample size among the three groups of items?
No (boxplots provide no information
about sample size)
e) (8 pts) Write a paragraph summarizing what these boxplots reveal about the percentage differences between actual and advertised calorie content among the three groups of food items.
The most striking aspect of these data is that locally marketed food items tend to have many more calories than advertised. The median discrepancy in this group is over 50%. There is also tremendous variability in these discrepancy percentages for the local items, ranging from close to zero to almost 250%. On the other extreme, the nationally marketed items tend to have calorie amounts very close to what is advertised, with very little variation. The median discrepancy percentage in this group is close to zero, and there is little variability except for an outlier that actually has fewer calories than advertised. The regionally marketed items fall in between local and national, both in terms of center and spread. The regional items do tend to have more calories than stated, but nowhere near as much as the local ones, and the variability is less than with the local items as well.