Investigation 5: Simpson's Paradox (due Thur, Sept 30)
You may work with one other person on this, handing in one report with both names. Word-processed reports are preferred to hand-written ones. Please start early so that you have time to ask questions.
The following two-way table classifies hypothetical hospital patients according to the hospital that treated them and whether they survived or died:
|
|
survived |
died |
total |
|
hospital A |
800 |
200 |
1000 |
|
hospital B |
900 |
100 |
1000 |
Suppose that one of these 2000 patients is chosen at random. Define the following events: A = {patient went to hospital A}, B = {patient went to hospital B}, S = {patient survived}, D = {patient died}.
a) Determine the conditional probability that the patient survived, given that he/she went to hospital A. Also determine the conditional probability that the patient survived, given that he/she went to hospital B. Also be sure to express this probability using the event symbols defined above.
b) Which hospital saved the higher percentage of its patients?
Suppose that when we further categorize each patient according to whether they were in fair condition or poor condition prior to treatment we obtain the following two-way tables:
|
fair condition: |
|
survived |
died |
total |
|
|
hospital A |
490 |
10 |
500 |
|
|
hospital B |
850 |
50 |
900 |
|
poor condition: |
|
survived |
died |
total |
|
|
hospital A |
310 |
190 |
500 |
|
|
hospital B |
50 |
50 |
100 |
Before proceeding, convince yourself that when the “fair” and “poor” condition patients are combined, the totals are indeed those given in the table above. Also define the following events: F = {patient was in fair condition}, P = {patient was in poor condition}.
c) Among those who were in fair condition, compare the survival rates for the two hospitals. Also express these as conditional probabilities using the event symbols defined above. Which hospital saved the greater percentage of its patients who had been in fair condition?
d) Among those who were in poor condition, compare the survival rates for the two hospitals. Also express these as conditional probabilities using the event symbols defined above. Which hospital saved the greater percentage of its patients who had been in poor condition?
The phenomenon that you have just discovered is called Simpson’s paradox, which refers to the fact that aggregate (combined) proportions can reverse the direction of the relationship seen in the individual pieces.
e) Write a few sentences explaining (arguing from the data given) how it happens that hospital B has the higher recovery rate overall, yet hospital A has the higher recovery rate for each type of patient.
f) Which hospital would you rather go to if you were ill? Explain.
g) Construct your own hypothetical data to illustrate Simpson's paradox in the following context. Show that it is possible for one softball player (Amy) to have a higher proportion of hits than another (Barb) in June and in July and yet to have a lower proportion of hits for the two months combined. [Hint: You will need to make the sample sizes different for the two players in the two months. One possibility is to give Amy a sample size of 10 trials in June and 40 trials in July, with Barb getting a sample size of 40 trials in June and 10 in July.]