Stat 321                      Exam 3                        November 18, 2004

 

Please write clearly on your own paper.  You may use a calculator, notes, and text.  You have 50 minutes to complete the exam, except that you may choose any one of these problems to submit by 1:11pm on Monday.  Clearly indicate which problem you choose on the paper that you hand in. You are to work completely independently on all aspects of the exam.

 

For questions that call for calculations, present your method of solution in a clear, well-labeled manner and show the details of your calculations.  For questions that ask for interpretations and explanations, explain your answers fully unless instructed otherwise.

 

1. (18 pts) Suppose that the wrapper of a certain candy bar lists its weight as 2.13 ounces.  Suppose that the actual weights of these candy bars vary according to a normal distribution with mean m = 2.2 ounces and standard deviation s = .06 ounces.

a) (6 pts) What proportion of candy bars weigh less than the advertised weight?

 

The z-score is (2.13-2.2)/.06 = -1.17.  The table of standard normal probabilities reveals the probability of weighing less than the advertised amount to be .1210.

 

When the standard deviation is s = .04 ounces, the z-score is (2.13-2.2)/.04 = -1.75 and the probability is .0401.

 

b) (6 pts) If the manufacturer wants to adjust the production process so that only 1 candy bar in 1000 weighs less than the advertised weight, what should the mean of the actual weights be (assuming that the standard deviation of the weights remains .06 ounces)?

 

The table of standard normal probabilities reveals that the z-score should be -3.10 (or thereabouts) for this probability to equal .001, so we need (2.13-mu)/.06 to equal -3.10, which gives mu = 2.13+3.10(.06) = 2.316.

 

When the standard deviation is s = .04 ounces, this becomes mu = 2.13+3.10(.04) = 2.254.

 

c) (6 pts) In a random sample of 5 candy bars, what is the probability that at least one of the candy bars weighs less than the advertised weight?

 

The probability that one randomly selected candy bar weighs less than the advertised weight is .1210 (from part a), so the probability that one randomly selected candy bar does not weigh less than the advertised weight is 1-.1210 = .8790.  The probability that all five will not weigh less than the advertised weight is therefore (.8790)^5, so the probability that at least one of the five candy bars will weigh less than the advertised amount is 1-(.8790)^5, or about .4753.

 

When the standard deviation is s = .04 ounces, this becomes 1-(1-.0401)^5, or about .1851.

 

2. (14 pts) Suppose that three firms (A, B, C) are competing for two construction contracts.  Let the random variable X be the number of contracts awarded to firm A, and let the random variable Y be the number of contracts awarded to firm B.  The joint probability mass function (pmf) of X and Y is given in the following table:

p(x,y)

y=0

y=1

Y=2

x=0

1/9

2/9

1/9

x=1

2/9

2/9

0

x=2

1/9

0

0

a) (4 pts) Determine the probability that firm A and firm B receive the same number of contracts.

 

P(X=Y) = p(0,0) + p(1,1) + p(2,2) = 1/9 + 2/9 + 0 = 3/9 = 1/3

 

b) (4 pts) Do you expect the covariance between X and Y to be positive, negative, or zero?  Provide a brief intuitive explanation, without performing any calculations.

 

Because a large value of X makes a large value of Y less likely (for example, knowing that X=2 tells us for sure that Y=0), the covariance between X and Y should be negative.

 

c) (6 pts) Calculate the covariance between X and Y.

 

E(XY) = (0)(0)(1/9) + (0)(1)(2/9) + (0)(2)(0) + (1)(0)(2/9) + (1)(1)(2/9) + (1)(2)(0) + (2)(0)(1/9) + (2)(1)(0) + (0)(2)(0) = 2/9.

To find E(X), note that P(X=0)=4/9, P(X=1)=4/9, and P(X=2)=1/9.  Thus, E(X) = (0)(4/9) + (1)(4/9) + (2)(1/9) = 2/3.  Similarly, E(Y) = 2/3.

Thus, Cov(X,Y) = E(XY)-E(X)E(Y) = 2/9 –(2/3)(2/3) = -2/9.

 

3. (12 pts) Suppose that scores on a midterm exam follow a normal distribution with mean 70 and standard deviation 8, while scores on a final exam follow an independent normal distribution with mean 60 and standard deviation 12.  Suppose also that the instructor weights the final to count twice as much as the midterm.  Course grades are therefore based on the random variable T = M + 2F, where T represents the total course score, M represents the midterm exam score, and F represents the final exam score. 

a) (6 pts) Specify the probability distribution of T.

 

Because T is a linear combination of two normal random variables, T will have a normal distribution.  Its mean is E(T) = E(M)+2E(F) = 70+2(60) = 190.  Its variance is V(T) = V(M) + 2^2 V(F) = 8^2 + 4*12^2 = 640, so SD(T) = sqrt(640), which is about 25.30.

 

When E(F)=55 and SD(M)=6, we get E(T) = E(M)+2E(F) = 70+2(55) = 180 and V(T) = V(M) + 2^2 V(F) = 6^2 + 4*12^2 = 612, so SD(T) = sqrt(612), which is about 24.74.

 

b) (6 pts) Determine the probability that a student scores above 200 for his/her total course score.

 

P(T>200) = P(Z>(200-190)/25.30) = P(Z>0.40) = 1-.6554 = .3446.

 

When E(F)=55 and SD(M)=6, we get P(T>200) = P(Z>(200-180)/24.74) = P(Z>0.81) = 1-.7910 = .2090.

 

4. (18 pts) Suppose that the lifetime of a light bulb follows an exponential distribution with mean 3000 hours.

a) (6 pts) What is the probability that a light bulb will last for less than 2750 hours?

 

The cdf of an exponential distribution with parameter lambda is F(x) = 1-exp(-x*lambda), so P(X<2750) = F(2750) = 1-exp(-2750/3000) = 1-exp(-11/12), or about .6001.  Note that we know that lambda = 1/3000 because E(X)=1/lambda.

 

b) (8 pts) In a random sample of 60 light bulbs, what is the (approximate) probability that the sample mean lifetime will be less than 2750 hours?

 

With a sample size of n=60, the Central Limit Theorem establishes that the sample mean X-bar will follow (approximately) a normal distribution, with mean 3000 hours and standard deviation sigma/sqrt(n) = 3000/sqrt(60) = 387.3 hours.  Thus, P(X-bar<2750) = P(Z<(2750-3000)/387.3)) = P(Z<-0.65) = .2578.  Note that we know that sigma=3000 because the variance of an exponential distribution is 1/lambda^2, so the std dev is 1/lambda, which is the same as its mean.

 

With a sample size of n=40, the Central Limit Theorem establishes that the sample mean X-bar will follow (approximately) a normal distribution, with mean 3000 hours and standard deviation sigma/sqrt(n) = 3000/sqrt(40) = 474.3 hours.  Thus, P(X-bar<2750) = P(Z<(2750-3000)/474.3)) = P(Z<-0.53) = .2981.

 

c) (4 pts) Would the probability in b) be larger, smaller, or the same with a random sample of size 150?  Explain briefly, without performing any calculations.

 

The probability would be smaller.  The larger sample size would mean that X-bar would have less variation about mu (a smaller std dev), so there would be a greater probability that X-bar would fall close to mu (3000) and a smaller probability that X-bars would fall below 2750.

 

5. (18 pts) The midrange of a set of data is defined to be the average (mean) of the minimum and maximum values in the data.  The midhinge is defined to be the average of the lower quartile (lower fourth) and the upper quartile (upper fourth).  Consider the following data, which are the calorie contents of 20 beef hot dogs:

111      131      132      135      139      141      148      149      149      152

153      157      158      175      176      181      184      186      190      190

a) (6 pts) Calculate the midrange and midhinge for these data.

 

The midrange is (111+190)/2 = 150.5.

The lower quartile is the average of the 5th and 6th ordered values: (139+141)/2 = 140, and the upper quartile is the average of the 15th and 16th ordered values: (176+181)/2 = 178.5, so the midhinge is (140+178.5)/2 = 159.25.

 

b) (6 pts) Are these statistics (midhinge and midrange) measures of center or measures of spread?  [Indicate one or the other- center or spread.]  Explain your answer.

 

These are both measures of center, as they produce estimates of where the data are located, not how spread out the data are.

 

c) (6 pts) Which (or possibly both, or possibly neither) would you expect to be resistant to outliers?  Explain briefly.

 

The midhinge is resistant to outliers, because it is based on the quartiles.  A few extreme values do not typically affect quartiles, unless the sample size is quite small.  On the other hand, the midrange considers only the smallest and largest values, so it is extremely susceptible to outliers and therefore not resistant.

 

6. (20 pts) In a recent study, researchers purchased 40 food items in New York City and determined the actual calorie content of each through a laboratory analysis.  They then calculated the percentage difference between the actual calorie content and the calorie count listed on the item’s label.  (A positive percentage difference corresponds to a food item whose actual calorie content was higher than what the label claimed.)  Each food item was also classified according to whether it was marketed locally, nationally, or regionally.  The boxplots below were constructed to compare the distributions:

Indicate (with a simple “yes” or “no”) whether it is valid to conclude that the locally marketed food items:

a) (3 pts) have the largest median of percentage differences among the three groups of items

 

Yes

 

b) (3 pts) have the largest IQR of percentage differences among the three groups of items?

 

Yes

 

c) (3 pts) have the largest lower quartile (lower fourth) of percentage differences among the three groups of items?

 

No (it’s hard to tell which has the smaller lower quartile between the local and regional groups)

 

d) (3 pts) have the largest sample size among the three groups of items?

 

No (boxplots provide no information about sample size)

 

e) (8 pts) Write a paragraph summarizing what these boxplots reveal about the percentage differences between actual and advertised calorie content among the three groups of food items. 

 

The most striking aspect of these data is that locally marketed food items tend to have many more calories than advertised.  The median discrepancy in this group is over 50%.  There is also tremendous variability in these discrepancy percentages for the local items, ranging from close to zero to almost 250%.  On the other extreme, the nationally marketed items tend to have calorie amounts very close to what is advertised, with very little variation.  The median discrepancy percentage in this group is close to zero, and there is little variability except for an outlier that actually has fewer calories than advertised.  The regionally marketed items fall in between local and national, both in terms of center and spread.  The regional items do tend to have more calories than stated, but nowhere near as much as the local ones, and the variability is less than with the local items as well.