Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Share This Print This RSS Feed DataBasics At the Plate, a Statistical Puzzler: Understanding Simpson's Paradox By Arthur Smith August 20, 2010 In 2007 Red

Share This Print This RSS Feed DataBasics At the Plate, a Statistical Puzzler: Understanding Simpson's Paradox By Arthur Smith August 20, 2010 In 2007 Red Sox rookie phenom Jacoby Ellsbury batted .353 while teammate Mike Lowell batted .324. In 2008, in what would be his first full season in the majors, Ellsbury again outperformed Lowell, batting .280 to Lowell's .274. So Ellsbury clearly outperformed Lowell at the plate over the two-year stretch, right? Wrong. Over the course of the two years, Lowell was superior at the plate, out batting Ellsbury .304 to .293. What at first glance may seem confusing comes down to a simple problem of aggregation -- and a classic example of the statistical phenomenon known as Simpson's Paradox. Simpson's Paradox occurs when a relationship between two variables -- in this case,the batting averages -- is reversed when an additional variable is taken into account. The additional variable to consider here is the number of at-bats each player had in each season. Not doing so results in "omitted variable bias," or a change in how the relationship between the two batting averages is understood. And it matters -- not only for the important business of sizing up your favorite baseball players, but also for such pursuits as assessing the nation's schools or understanding state-by-state obesity rates. Red Sox Third Baseman Mike Lowell How it Works was outperformed at the plate by teammate Jacoby Ellsbury over a twoYou weigh the importance of additional factors, or variables, implicitly in almost every evaluation you make. Your year stretch. Or was he? (http://www.flickr.com/photos/keithallison/ friend might tell you that his team is better because it has a better record, to which you might counter,"OK, but my / CC BY-SA 2.0) team plays a harder schedule." You understand intuitively that an accurate assessment of the relationship between the teams cannot be made by assessing the record alone. Therefore, when including, or conditioning your argument on a third variable (beyond just wins and losses), you can get a picture that may change what initially had been thought. Simpson's Paradox goes one step further. It says not only does omitting an important variable change a relationship, but in fact, it can completely reverse how the relationship is perceived. Ellsbury may seem to have had the hotter bat across the two years, but the facts show Lowell actually had a better two-year average. Here's how it works: Year 2007 2008 2007 and 2008 Jacoby Ellsbury 41/116 (.353) 155/554 (.280) 196/670 (.293) Mike Lowell 191/589 (.324) 115/419 (.274) 306/1008 (.304) Yearly stats from ESPN.com Yes, Ellsbury had the better average both seasons. But he had far fewer at bats than Lowell in 2007, having joined the team late in the season. Ellsbury had only 116 at bats in 2007, while Lowell had 589. Therefore, for Ellsbury's combined average, the second year is weighted much more heavily, while Lowell's average in his better season - 2007 -- carries greater weight than his average in his worse season. The result is Lowell's superior average on aggregate, a Simpson's Paradox. Beyond Baseball Simpson's Paradox actually occurs with some frequency, and not just in baseball. Every year students across the country take the National Assessment of Educational Progress exams, with results mapped to a variety of factors, including whether students are eligible for the national school lunch program. A comparison between school-lunch eligible eighth-graders in New York City and California for 2007 finds that a lower percentage of the New Yorkers scored below basic in math. The same was true for New Yorkers not eligible for school lunch compared to Californians not eligible. However, in aggregate, fewer California eighth-graders were below basic in math than were NYC eighth graders. Why? A significantly higher percentage of NYC students are eligible for the school lunch program: Jurisdiction New York City School Lunch Eligible School Lunch Ineligible Combined (% Below Basic) 45.9 17.2 42.5 California (% Below Basic) 54.2 27.8 40.9 2007 NAEP math In Health, at least four examples of Simpson's Paradox are found in state-level obesity data among black, white and Hispanic adults reporting a Body Mass Index over 30: State Black Adults White Adults Hispanic Adults All Adults Mississippi 40 28 26 32 Alabama 42 29 28 31 Louisiana 36 25 20 28 Kansas 39 26 29 27 Louisiana 36 25 20 28 Iowa 38 26 25 26 District of Columbia 35 10 17 24 Massachusetts 36 20 29 21 BRFSS 2006-2008 Deciding What's Relevant One of the most important aspects of reading these or any such statistics is deciding which ones matter. Simpson's Paradox might be used to demonstrate that, in fact, New York City's schools were outperforming California's insofar as New York City has a smaller percentage of children performing below basic in both categories of lunch eligibility. However, one might also use it to try to demonstrate that, contrary to common interpretation of baseball statistics, Lowell actually had a better two years than Ellsbury. Recognizing Simpson's Paradox, and omitted variable bias in general, has very important policy implications. In the case of New York's schools, one could argue that rather than investing more in education, you need to address the underlying social conditions that are leaving such a high percentage of students eligible for the national school lunch program. Suddenly a policy question about education can become a policy question about poverty when looking at the system as a whole. It is an important aspect of research to determine what other variables matter for a question and which ones don't, and there is yet another layer for policy in determining which ones can be most effectively addressed to produce desired outcomes. Not only is it important to be aware of the pitfalls of Simpson's Paradox, but it is also necessary to determine whether the paradox is relevant to the question being asked. Simpson's Paradox, then, serves as a pertinent reminder that there is often more to a statistic than meets the eye. Arthur Smith is a recent graduate of Georgetown University, with a degree in international economics. He spent his junior year studying at the London School of Economics. At the State of the USA, he has worked on education and economy projects, includng, the data selection, collection and presentation processes. He has also worked as a research assistant on projects analyzing returns on education, and perceptions of AIDS in Kenya. Posted by Arthur Smith at 2:43 PM on August 20, 2010 Recommend 19 people recommend this. BE A BETA TESTER A 'beta,' or initial State of the USA website, is currently in development. To volunteer to participate in our future beta testing program, email us (with subject line: 'Beta Test'). All submissions are private. For questions, please use the feedback box below. contact us Support measuring national Name: progress with a donation to State of the USA via PayPal. Email: Amount: Comment: Give us your site feedback! privacy policy terms of use 2016 State of the USA. Privacy Policy | Terms of Use | Contact Us

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Algebra Math 1st Grade Workbook

Authors: Jerome Heuze

1st Edition

979-8534507850

More Books

Students also viewed these Mathematics questions