Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1. Read the article How to Display Data Badly, by Howard Wainer listed above. It was published in the American Statistician in 1984, volume 38,

1. Read the article \"How to Display Data Badly,\" by Howard Wainer listed above. It was published in the American Statistician in 1984, volume 38, pages 137-147. It is a quick, informative, and (dare I say?) enjoyable read. Scour the internet (or if you are feeling particularly plucky, look through your company's annual report!) to find an example of a poorly made or misleading graph and post it to the conference, pointing out where you think the graph went awry. Reproduce the data in a better (more illuminating, more honest, etc) format--you may have to do some guesswork on what the underlying data actually are. http://www.jstor.org.ezproxy.umuc.edu/stable/2683253 2. Watch some TV and pay particular attention to the commercials. Post at least three examples of statistics presented in commercials (they can come from different commercials) and say what you think the message the advertisers were trying to get across by the statistics presented. Then, take a look at some of the examples presented by other students and critique the statistics from the advertisements. Do the statistics presented actually demonstrate the message they are trying to get across? As an example, one of my favorite (for its misleading use of statistics!) commercials is an Allstate Insurance ad featuring Dennis Haysbert. At one point, he states that people who switch from Geico to Allstate saved an average of approximately $400 per year on car insurance. The message the advertisers are trying to get across is that Allstate Insurance is less expensive than Geico. Does this statistic actually demonstrate that Allstate Insurance is in fact cheaper, though? I have two criticisms of this statistic. The first is in the samplethey are looking exclusively at people who switch to Allstate from Geico. Only people who expect to save money would actually switch. If you were a Geico customer and called Allstate for a quote, and were quoted a higher number, you wouldn't switch! Thus, they are clearly biasing their statistic by being very selective in their sample group. My other criticism is that they are using the average, when a median may be a more appropriate measure to use. Let's use a few ad hoc numbers to put both of these criticisms together to see my point. Let's say that 100 Geico customers call Allstate to get an insurance quote. Of those 100, 97 find that Geico is cheaper by $200, while three find that Allstate is cheaper and decide to switch. Two of the three find that they will save $150 per year, while the third finds that he will save $900 per year. It's true, of course, that people who switch save an average of $400. However, saving $400 is certainly not typical of switchers. Saving $150 (the median) is far more representative of the benefit from switching than $400 (the mean). Furthermore, if we look at the price difference considering all the drivers, not just those who switched, we find that Geico is cheaper on average by $182. Taylor & Francis, Ltd. American Statistical Association How to Display Data Badly Author(s): Howard Wainer Source: The American Statistician, Vol. 38, No. 2 (May, 1984), pp. 137-147 Published by: Taylor & Francis, Ltd. on behalf of the American Statistical Association Stable URL: http://www.jstor.org/stable/2683253 Accessed: 07-10-2015 13:30 UTC REFERENCES Linked references are available on JSTOR for this article: http://www.jstor.org/stable/2683253?seq=1&cid=pdf-reference#references_tab_contents You may need to log in to JSTOR to access the linked references. Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/ info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Taylor & Francis, Ltd. and American Statistical Association are collaborating with JSTOR to digitize, preserve and extend access to The American Statistician. http://www.jstor.org This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions are of Commentaries informative essays dealingwithviewpoints statisticalpractice,statistical education, and othertopicsconsideredto to be of generalinterest the board readership The AmericanStatisof are to tician.Commentaries similarin spirit Lettersto theEditor,but theyinvolvelongerdiscussionsof background,issues, and perspecwill be refereedfor their meritand comtives. All commentaries withthese criteria. patibility How to Display Data Badly HOWARD WAINER* Methods for displayingdata badly have been develof oping formanyyears,and a wide variety interesting and inventive schemeshave emerged.Presentedhere is a synthesis yieldingthe 12 most powerfultechniques thatseem to underliemanyof the realizations foundin and practice.These 12 (the dirty dozen) are identified illustrated. KEY WORDS: Graphics; Data display;Data density; Data-ink ratio. categorized. This article is the beginningof such a compendium. The aim of good data graphics to displaydata accuis ratelyand clearly.Let us use thisdefinition a starting as pointforcategorizing methodsof bad data display.The definition threeparts. These are (a) showingdata, has (b) showing data accurately, and (c) showing data clearly.Thus, ifwe wishto displaydata badly,we have three avenues to follow. Let us examine them in sequence, parse themintosome oftheir component parts, and see if we can identify means for measuringthe success of each strategy. 2. SHOWING DATA 1. INTRODUCTION The displayof data is a topic of substantial contemand one thathas occupied the thoughts poraryinterest of manyscholarsforalmost200 years.Duringthistime to there have been a numberof attempts codifystandards of good practice (e.g., ASME Standards 1915; Cox 1978; Ehrenberg 1977) as well as a number of books that have illustrated them (i.e., Bertin 1973,1977,1981; Schmid 1954; Schmid and Schmid 1979; Tufte 1983). The last decade or so has seen a tremendous increasein the development new display of techniquesand tools thathave been reviewedrecently (Macdonald-Ross 1977; Fienberg 1979; Cox 1978; Wainer and Thissen 1981). We wish to concentrate on methodsof data displaythatleave the viewersas uninas formed theywerebeforeseeingthedisplayor, worse, those thatinduce confusion.Althoughsuch techniques are broadlypracticed,to my knowledgetheyhave not as yet been gatheredinto a single source or carefully Obviously,if the aim of a good displayis to convey information, less information the carriedin the display, Change in Science Achievement of 9-, 13-,and 17-Year-Olds,by Type of Exercise: 1969-1977 correct in Change percent 1, O1 A ,,i,,,, biologicalscience _. Physical science 9-YEAR-OLDS s ss llll.||| ________ -2 -3 _ 13-YEAR-OLDS 0I -2 -3 _ -4 _5 Educational Testing *Howard Waineris Senior Research Scientist, NJ Service,Princeton, 08541. This is the textof an invitedaddressto Association. It was supportedin partby the the AmericanStatistical Research Project of the Educational TestingSerProgramStatistics to vice. The authorwould like to expresshis gratitude the numerous friendsand colleagues who read or heard this article and offered Especially helpfulwere valuable suggestionsfor its improvement. David Andrews, Paul Holland, Bruce Kaplan, James 0. Ramsay, in Edward Tufte, the participants the StanfordWorkshopon Adtwo anonymousreferees,the longvanced Graphical Presentation, associate editor,and Gary Koch. suffering -6 -2 -h -4 1969 Is_ _ 1970 17-YEAR-OLDS = a ............... ...1 1_ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ 1973 1977 S13 Figure 1. An example ofa low densitygraph (from [ddi = .3]). May 1984, Vol. 38, No. 2 C) The AmericanStatistician, This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions 137 and Schools Public Private Elementary Selected Years1929-1970 m Public -Prjvale d0oi Thousan Schools 0.8 F 300 2 i_ 0. - 0 t.4 0.0 0.6 LOCATION DIFFERENCE: 1929-30 1.0 0.Z the worse it is. Tufte(1983) has devised a scheme for called in the measuring amountof information displays, the data densityindex (ddi), whichis "the numberof numbersplotted per square inch." This easily calcuIn informative. popular lated indexis oftensurprisingly and technicalmedia we have founda range from.1 to 362. This provides us with the firstrule of bad data display. jyy US.vsJapan _~~~~~~~~~~~~~. pe r mon-ho urin mQanus 15- 14 - 13C,, Is * 12C/ 70% 62.3%/ ur ngfn cia reta-oU 1930 1940 1950 1960 1970 10 _ upt 44%/ 9 0" Post) (? graph 1978,TheWashington with 3. Figure Alowdensity to in chart-junk fill thespace (ddi = .2). 138 1969-70 THE SCHOOLS OF ELEMENTARY NUMBER PRIVATE FROM 1930-1970 What does a data graphicwitha ddi of .3 look like? Shown in Figure 1 is a graphicfromthe book Social III done in fourcolors (origIndicators (S13), originally (18/63= .3). inal size 7" by9") thatcontains18 numbers of The median data graphin S13 has a data density .6 this numbers/in2; one is notan unusualchoice. Shownin Figure 2 is a plot fromthe article by Friedman and in Rafsky(1981) witha ddi of .5 (it shows4 numbers 8 100%-moutpu 1959-60 for data the in2).Thisis unusual JASA,where median of has of graph a ddiof27. In defense theproducers this of of is that method a plot,thepoint thegraph to show of paperwas not by analysis suggested a critic their I suspect prosewouldhaveworked that pretty fruitful. wellalso. datadencan high arguments be madethat Although will that does notimply a graphic be good,norone sity of on bad, with density itdoesreflect theefficiency low if of Obviously, we hold thetransmission information. and moreinformationbetis constant, clarity accuracy Rule 1-Show as Few Data as Possible (Minimize the Data Density) .00 1949-50 Year School S13). Figure 4. Hidingthe data in the scale (from 3 JNIT Friedmanand Rafsky1981 Figure 2. A low densitygraph (from [ddi = .5]). Labor 1 939-40 1930 1940 1950 1960 9.275 10.000 10.375 13.574 14.372 1910 4 Figure Expanding scale and showing data inFigure 5. the the (from S13). (? The AmericanStatistician, May 1984, Vol. 38, No. 2 This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions A New Set ofProjectins forthe U.S. Supplyof Energy Compared are two proctlons ot United State *rtrgy upply In th, y.r 2000 made by the Pftedynt s Council of talOuallty and th ectual Envirnonot 1977 supply Alltigurasar inquads * a uunita ootm"aurmomntthat reprnt millionbillin-on quadrilion- Britlsh thettal unlts (8T U a), a standard masure otfergy .N 1.: - ,l&U^,* Cm nlions ofU.S dollars) . T0~~i_ tal 777 1977 M~~~~~~S 5 5_ / 2000 - AEr,ph ,egy syes c_e,to I U 4 -14 2 -1 a t Nca, '40 (inmillions U S dollars) of 3,000 Solad 6,000 U.S. exports to China Coal 19 2000-8 Erphas.zes.rc,eased oduct.on erergy 37d Total 1a 4,000 U.S. exports to Taiwan U.S. imports from China 1 9 2000 1,000 * l ____ U.S. imports from Taiwan 2,000 ' Tutal 05 7 7 7) O.la,daa* (1979 The New York Times Figure 6. Ignoringthe visual metaphor (? 1978, The New York Times). ter than less. One of the great assets of graphical techniques is that theycan convey large amounts of information in a small space. We note that when a graph contains littleor no information the plot can look quite empty (Figure 2) and thus raise suspicions in the viewer that there is nothing to be communicated. A way to avoid these suspicions is to fill up the plot with nondata figurations-what Tufte has termed "chartjunk." Figure 3 shows a plot of the labor productivity of Japan relative to that of the United States. It contains one number for each of three years. Obviously, a graph of such sparse information would have a lot of blank space, so filling the space hides the paucity of information from the reader. A convenient measure of the extent to which this practice is in use is Tufte's "data-ink ratio." This measure is the ratio of the amount of ink used in graphing the data to the total amount of ink in the graph. The closer to zero this ratio gets, the worse the graph. The notion of the data-ink ratio brings us to the second principle of bad data display. Rule 2-Hide WhatData You Do Show (MinimizetheData-Ink Ratio) One can hide data in a variety of ways. One method that occurs with some regularityis hiding the data in the grid. The grid is useful for plotting the points, but only rarely afterwards.Thus to display data badly, use a fine grid and plot the points dimly (see Tufte 1983, pp. 94-95 for one repeated version of this). A second way to hide the data is in the scale. This corresponds to blowing up the scale (i.e., looking at the data from far away) so that any variation in the data is obscured by the magnitude of the scale. One can justify this practice by appealing to "honesty requires that we start the scale at zero," or other sorts of sophistry. In Figure 4 is a plot that (from S13) effectivelyhides the growth of private schools in the scale. A redrawing of the number of private schools on a differentscale conveys the growththat took place duringthe mid1950's (Figure5). The relationship betweenthisriseand Brown Topeka vs. SchoolBoardbecomes an immediate question. To conclude thissection,we have seen thatwe can displaydata badlyeitherbynotincluding them(Rule 1) 1972 1974 1976 1978 1980 1970 1972 1974 1976 1978 1980 of Source Dpartment Commerce Figure 7. Reversing the metaphorin mid-graphwhilechanging scales on both axes (? June 14, 1981, The New YorkTimes). or by hidingthem(Rule 2). We can measuretheextent to whichwe are successful excluding data through in the the data density;we can sometimesconvince viewers that we have included the data throughthe incorporationof chartjunk. Hidingthe data can be done either by usingan overabundanceof chartjunk by cleverly or choosingthe scale so thatthe data disappear. A measure of the success we have achieved in hidingthe data is through data-inkratio. the 3. SHOWING DATA ACCURATELY The essence of a graphicdisplayis thata set of numbers havingboth magnitudesand an order are represented by an appropriatevisual metaphor-the magnitude and order of the metaphoricalrepresentation matchthenumbers. can displaydata badlybyignorWe ing or distorting concept. this Rule 3-Ignore the Visual MetaphorAltogether If the data are orderedand ifthevisualmetaphor has a naturalorder,a bad displaywill surelyemergeifyou shuffle relationship.In Figure 6 note thatthe bar the labeled 14.1 is longerthanthe bar labeled 18. Another methodis to changethemeaningofthemetaphor the in middleof the plot. In Figure7 the dark shadingrepresentsimports one side and exportson theother.This on is but one of the problemsof thisgraph; more serious stillis the change of scale. There is also a difference in the timescale, but thatis minor.A commonthemein Playfair's(1786) work was the difference betweenimports and exports. In Figure 8, a 200-year-old graph tellsthe storyclearly.Two such plots would have illustratedthe storysurrounding this graph quite clearly. Rule 4-Only Order Matters trick to use length thevisualmetais as One frequent phorwhenarea is whatis perceived.Thiswas used quite effectively The Washington by Post in Figure 9. Note thatthisgraphalso has a low data density (.1), and its data-inkratio is close to zero. We can also calculate Tufte's (1983) measure of perceptual distortion (PD) forthisgraph.The PD in thisinstanceis the perceived May 1984, Vol. 38, No. 2 ?) The AmericanStatistician, This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions 139 T C IIA ItT & 1511'01TS8I E-xPORT.i r A LA_- E-.N; C_ ~ ......... . - ~ ~ ~ ~ ~ 5tf . 8. eariler 1786). (from Playfair Figure A ploton thesame topicdone welltwocenturies to Eisenhower changein thevalueof thedollarfrom divided theactual change. readandmeasure I by Carter thus: Til E tIXITEI, S1A'M1IS, (WFAMIEICiA.A E 1430I3632 Measured Actual U5 5 1.00- .44 =44 1.27 5 = PD = 9.68/1.27 7.62 1958- ESENHOWER: $1. Ti E t FAV ' but This distortion over700% is substantial by no of meansa record. in A less distorted viewof thesedata is provided suggested the by the Figure10. In addition, spacing 0 94c 1963 - KENNEDY: pw~~~~~~~~~~~ E I SENHOWER KENNET D JOHNSON t 4 362t X 0.8 53Uc 1968- JOHNSON: IN: 22.00- 2.06 96 2.06 I 1TEDISTA\\TE:SM'A3.11:11 . _ ~0. 4 =0.2 CC ofthel lXnlshllng r4 ffiN^>: 0.2 Dollar rc:LuborDportment 0. O.I 44CAR 1978-CTER: (August) by Figure 9. An example of how to goose up the effect squaring the eyeball (? 1978, The WashingtonPost). 140 I 1958 1963 1968 YERR 1973 I 1978 line 9 (from Figure Thedata inFigure as an unadorned chart 10. Wainer,1980). ? The AmericanStatistician, May 1984, Vol. 38, No. 2 This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions presidentialfaces is made expliciton the time scale. Rule 5-Graph Data Out of Context Often we can modifythe perceptionof the graph for (particularly timeseriesdata) by choosingcarefully dropcan disappear theinterval displayed.A precipitous date just afterthe drop. Simiif we choose a starting meandersintosharpchangesby we larly, can turnslight on focusing a singlemeanderand expandingthe scale. but Oftenthe choice of scale is arbitrary can have proof on foundeffects theperception thedisplay.Figure11 shows a famous example in which PresidentReagan of view givesan out-of-context of theeffects histax cut. for providesthecontext a deeper The Times'alternative the contextas Simultaneously omitting understanding. scale is the keyto the practice well as any quantitative of Ordinal Graphics(see also Rule 4). Automaticrules do not always work, and wisdom is always required. In Section3 we discussedthreerulesforthe accurate accuracy ignorby displayofdata. One can compromise ingvisual metaphors(Rule 3), byonlypayingattention to the order of the numbersand not theirmagnitude (Rule 4), or by showingdata out of context(Rule 5). We advocated the use of Tufte'smeasureof perceptual as the to distortion a wayof measuring extent whichthe by accuracyof the data has been compromised the disof that play. One can think modifications wouldallow it to be applied in other situations,but we leave such expansion to otheraccounts. 4. SHOWING DATA CLEARLY In this section we discuss methods for badly displayingdata that do not seem as serious as those deYORK THE NEW Paymentsunderthe Ways and Means Committee plan $2500 2000 1 ??? w $ S NEW 1 ,829,000 ; 1,636,000 2000 % 1,500,000(- INCOME - ... 1986 $ 1,555,000 - S20.000 1982 11W0 m The soaraway Post the daily paper New Yorkers trust 1 700,000 AVERAGE FAMILY by SOfl4Nfl~ York Post compared to the plummeting Daily News circulation. The reason given is that New Yorkers "trust" Post. It takesa careful the look to note the 700,000jump thatthe scale makesbetween two the lines. In Figure13 is a plot of physicians' incomes over time.It appearsto be linear, with slight a tapering off inrecent A years. careful lookat thescaleshows that it starts plotting out every eight years endsupplotting and A yearly. moreregular scale(in Figure tellsquitea 14) different story. 1,800000 $2500 YOUR TAXES Tawpld Thisis a powerful technique can makelargedifthat look and ferences small makeexponential changes look linear. In Figure12 is a graph thatsupports associated the story about the skyrocketing circulation The New of __ Paymentsundr the Prdentfs proposW 1500 Rule 6-Change Scales in Mid-Axis 1,900,000. 2, 1981 SUNDAY, AUGUST TIMES, that scribed previously; is, thedata are displayed, and in they might evenbe accurate their portrayal. subYet tle(and notso subtle) techniques be usedto effeccan tively obscurethe mostmeaningful interesting or aspects thedata.Itismore of difficultprovide to objective measures presentational of clarity, we relyon the but reader judgefrom examples to the presented. bu,000 - 1,491,000 - THEIR ~~~~~~~BILL l 500 OUR ... BILL k00t0 - E~~~17 .:_ Fiur 12. Chngn 1982 1983 1984 1985 .scl -_ 197 __ 198 ~ a., :a1.... 198 1982_-- .-E inmid-ai tomak lag differences. 198 House showing neither scale norcontext Figure11. The White Times, reprinted permission). with (? 1981,The NewYork May 1984, Vol. 38, No. 2 ? The AmericanStatistician, This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions 141 of hIcomes Doctors Vs.Other Profesionals (MEDIANNETINCOMES) Median Income of Year-Round, Full-Time Workers 25 to 34 Years Old, by Sex and Educational Attainment:1961977 Constant 1977 dollars $20,000 $18000 SOURCE: on Council WageandPrice Stability 54,14 5,4 1i 50,823 _ _, ~~, I~-- _ ,_~ _s r - $8,000 - $4,000 - $2,000 25,050 years 6to16 or M ~ ... $14,001 A $ l0 1951 1955 1963 1965 1967 1970 1972 1973 1974 1975 1976 Figure 13. Changingscale inmid-axistomake exponentialgrowth linear (? The WashingtonPost). Rule 7-Emphasize the Trivial(Ignore theImportant) Sometimesthe data thatare to be displayedhave one important aspect and othersthatare trivial.The graph can be made worse by emphasizingthe trivialpart. In Figure 15 we have a page fromS13 that compares the incomelevels of men and womenby educationallevels. It reveals the not surprising resultthatbettereducated individualsare paid betterthan more poorlyeducated ones and thatchangesacrosstimeexpressedin constant dollars are reasonably constant. The comparison of greatestinterestand currentconcern, comparingsalaries between sexes withineducation level, must be made clumsily vertically from one graph by transposing to another.It seems clear that Rule 7 musthave been operating here,forit would have been easy to place the graphsside byside and allow the comparisonofinterest to be made more directly. Looking at theproblemfrom a strictly data-analytic pointof view, we note thatthere are two large main effects(education and sex) and a small time effect.This would have implied a plot that INCOMES OF DOCTORS VS. -__- -ale' - -|-- ~a own, larges effct _ _ _ - clal - pla a Feor $4,000 - _ _ $2,0 0 0 _ _ _ _ _ _ __ _ _ _ _ _ $0 1968 - _ _ _ _ _ _ - - an $1,000 9 - _ _ lees 1970 1972 1974 _ _ _ _ __ _ _ _ _ 1976 1978 S13). showedthe large effects clearlyand placed the smallish timetrendinto the background(Figure 16). 110 12_1`1F ,IA,A 1 '1 20- Males 16 C" 12 ~MalesV - Females 4 1964 1969 -maximum -median (uveriime) STRRTE0 n010. ~~~~~~~~~~~~~~~Female -' Legend z30/ 1959 e MEDIAN INCOMEYEAR-ROUND OF FULL WORKERS TIME 25-34 YEARS BY AND OLD SEX EDUCATIONAL ATTAINMENT: 1968-1977 CONSTANT DOLLARS) 1977 (IN 8 1954 1980 in the differences income through vertical placement ofplots (from PROFESSIONALS 1949 y ars _ the the of Hiding maineffect sex Figure15. Emphasizing trivial: DOCTORS OTHER 19414 - ic 20 OTHER PROFESSIONRLS H~~~~~~~~~~~EOICRRE d- eard o thessals _ _ I-- 1974 YEAR Figure14. Data from Figure13 redonewith linear scale (from Wainer 1980). 142 _ - 1V SW_L7 - showed~~ -60 cso 1939 ein meo ino 2 1 year _ _ 710 z ore 13 ro 1,5yeror -- 8,744 z20 2 = 16,107 $3,262 n50 = --LIL s__ 16eaorm -_ >_ $1200 __ _ 34,740 = _| ~~~ ____ ~ -n = $10,0000 46,780 1939 1947 s_ ~~~~ ~~ mm L A__ $12,000 43,100 13,150 ? = $16000 _ 62.799 - . $14,000 OFFICED-BASED NONSALARIED PHYSICIANS MALE - C) The AmericanStatistician, May 1984, Vol. 38, No. 2 This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions BIL. LB. 2.5 LAMB MUTTON ........ . ... 1.5 ....... ...... Canada, 1970-1972 --- - 1.0-...: . .;; ' : .: : :: : ',:: BEEFAN -:... o~~~~~~~.'.:;::-: 0. .;.... VEAL .. ....: ::.;. Finland, 1974 .: ,- :. ' : : : : : : :; ..,...,...; 'X'-* ....,,.;.-France,1972 1960 Female HIM Austria,1974 1975 ' , ,\\ : : :, : Female AND GOATMEAT 2.0 1 N. ' N": 0 m Male LifeExpectancy at Birth, Sex, Selected by Countres, Most Recent Available Year: 1970-1IMi U.S. IMPORTS OF RED MEATS 1963 i90 1969 1972 1975 1978 Germany ^~~~~~~~~~~~~~~~~~~~~~~~~~a (Fed Rep, 19731975 FA *eAftcA WG, r EOUIVALENT 1 '.:,: ill i -:-@:-:-:9o the Figure 17. Jiggling baseline makes comparisonsmore difficult 4 | R l Japan, 1974 Charts). of Handbook Agricultural (from U S S R., 1971-1972) i S S Rule 8-Jiggle theBaseline is Making comparisons alwaysaided when the quantitiesbeingcomparedstartfroma commonbase. Thus from we can alwaysmake the graphworse by starting as the hangingor susbases. Such schemes different pended rootogramand the residual plot are meant to facilitatecomparisons.In Figure 17 is a plot of U.S. of imports red meat takenfromthe Handbook of AgriculturalChartspublishedby the U.S. Departmentof Shadingbeneatheach line is a convention Agriculture. tellingus thatthe amountof thatindicatessummation, each kind of meat is added to the amounts below it. in Because of the dominance of and the fluctuations of importation beef and veal, it is hard to see whatthe changesare in theotherkindsof meat-Is the importaconstant? Decreasing? Staying tion of porkincreasing? is The onlypurposeforstacking to indicategraphically the total summation.This is easily done throughthe addition of another line for TOTAL. Note that a the TOTAL willalwaysbe clear and willneverintersect otherlineson theplot. A versionof thesedata is shown U.S. IMPORTS OF RED rMEATS* BIL. LB.- 2.5 e-_____~~_ POkK 101960 Source: Chart 1963 1966 1969 Charts , Handbook of Agri_ultural 1976, p. 93. Agriculture, Source: 1972 U.S . 1978 1975 Department of Origzinal line a 17 of version Figure with straight Figure18. Analternative used as thebasis ofcomparison. Sweden, 1971-1975 UnitedKingdom,1970-1972 UnitedStates, 1975 0 50 60 70 80 90 Years of life expectancy by Figure 19. AustriaFirst!Obscuring the data structure alphabetizingthe plot (from S13). in Figure18 withtheseparate amountsof each meat,as well as a summationline, shown clearly. Note how of easily one can see the structure importof each kind of meat now that the standard of comparison is a line (the time axis) and no longerthe import straight amountof those meats withgreatervolume. Rule 9-Austria First! Ordering graphs and tables alphabeticallycan obin scurestructure thedata thatwouldhave been obvious had the display been ordered by some aspect of the data. One can defend oneself against criticismsby entries "aids in finding out thatalphabetizing pointing such of interest."Of course,withlistsof modestlength aids are unnecessary;with longer lists the indexing atlases prostatistical schemescommonin 19thcentury vide easy lookup capability. Figure 19 is anothergraphfromSf3 showinglifeexnations. pectancies,dividedby sex, in 10 industrialized The order of presentationis alphabetical (with the USSR positionedas Russia). The messagewe getis that thereis littlevariationand thatwomenlive longerthan men. Redone as a stem-and-leaf diagram(Figure 20 is of simplya reordering the data with spacing proporthe tional to the numericaldifferences), magnitudeof leaps out at us. We also note thatthe the sex difference USSR is an outlierformen. Rule JO-Label (a) Illegibly,(b) Incompletely, and (c) Incorrectly, (d) Ambiguously There are manyinstancesof labels thateitherdo not May 1984, Vol. 38, No. 2 C) The AmericanStatistician, This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions 143 AT LIFE EXPECTANCY BIRTH, BY SEX, MOST RECENTAVAILABLEYEAR YEARS WOMEN SWEDEN FRANCE, US, JAPAN, CANADA FINLAND, AUSTRIA, UK USSR,GERMANY 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 673 62 r3Ef Commtssion Payqrents to Travel Agents 15o m L 1 20- L JUN ITE I SWEDEN JAPAN 0 N CANADA,UK,US, FRANCE GERMANY, AUSTRIA FINLAND 9 0- TWA 5 E A S TERN 0 F 60D ELTA D 0 USSR L 30- L I Figure 20. Orderingand spacing the data fromFigure 19 as a stem-and-leaf diagram provides insights previously difficult to extract(from S13). A R 5 0 1976 tell the whole story,tell the wrongstory,tell two or more stories,or are so small thatone cannotfigure out whatstory are they telling.One ofmyfavorite examples of small labels is fromThe New York Times (August 1977 1978 (ea t I ma t e d Y EAR Figure 22. Figure 21 redrawn with 1978 data placed on a comparable basis (fromWainer1980). 1978), in whichthearticle complains thatfarecutslower commission to payments travelagents.The graph(Figure 21) supports viewuntil this one noticesthetiny label thatthe small bar showingthe decline is for indicating halfof 1978. This omitssuch heavytravel just the first periodsas Labor Day, Thanksgiving, Christmas, so and on, so thatmerelydoublingthe first-half is probadata blynot enough. Nevertheless, whenthisbar is doubled (Figure 22), we see thatthe agentsare doingverywell indeed comparedto earlieryears. ToTravel Agents In lkosoldofdlAer $57~~~~~~~~~~~~~~~'6 Rule 11-More Is Murkier:(a) More Decimal Places and (b) More Dimensions O web ~~E,ASTIEEN discount of d and airlines' telephone unITE=D s areras fars volume. (ravel agents'overhead, offsetting revenuegainsfrom higher Figure Mixing changedmetaphor a tiny 21. a with labelreverses of Times). themeaning thedata (? 1978,The NewYork 144 We oftensee tables in whichthe numberof decimal places presentedis farbeyondthe numberthatcan be perceived by a reader. They are also commonly presentedto show more accuracythan is justified.A less. In Table displaycan be made clearerbypresenting 1 is a sectionof a table fromDhariyal and Dudewicz's (1981) JASA paper. The table entriesare presentedto five decimal places! In Table 2 is a heavilyrounded versionthatshowswhatthe authorsintendedclearly.It also shows thatthe variouscolumnsmighthave a substantialredundancyin them (the maximumexpected gain withb/c = 10 is about 1/10th of b/c = 100 and that 1/100th of b/c = 1,000). If theydo, theentiretable that could have been reduced substantially. Justas increasing the numberof decimal places can make a table harderto understand,so can increasing the number of dimensionsmake a graph more con- (C The AmericanStatistician, May 1984, Vol. 38, No. 2 This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions Table 1. OptimalSelection Froma Finite Sequence With Sampling Cost b/c = 10.0 N r* 3 4 5 6 7 8 9 10 2 2 2 3 3 3 3 4 100.0 (GN(r*) - a)/c 1,000.0 (GN(r*) - a)/c r* (GN(r*) - a)/c 2 2 3 3 3 4 4 4 .20000 .26333 .32333 .38267 .44600 .50743 .56743 .62948 r* 2.22500 2.88833 3.54167 4.23767 4.90100 5.57650 6.26025 6.92358 2 2 3 3 3 4 4 4 22.47499 29.13832 35.79166 42.78764 49.45097 56.33005 63.20129 69.86462 NOTE: g(Xs + r - 1) = bR(Xs + r - 1) + a, ifS =s, and g(Xs +r - 1)= 0, otherwise. Source: Dhariyaland Dudewicz (1981). We fusing. have alreadyseen how extradimensions can cause ambiguity it lengthor area or volume?). In (Is addition, human perceptionof areas is inconsistent. Just whatis confusing whatis notis sometimes and only a conjecture,yet a hintthat a particular configuration willbe confusing obtainedifthe displayconfusedthe is grapher.Shownin Figure23 is a plot of per shareearningsand dividends overa six-year period. We note (with some amusement)that 1975 is the side of a bar-the third dimension of this bar (rectangular parallelopiped?) charthas confused artist! suspectthat1975 the I is reallywhatis labeled 1976, and the unlabeled bar at the end is probably1977. A simpleline chartwiththis interpretation shownin Figure 24. is In Section4 we illustrate morerulesfordisplaying six data badly. These rulesfall broadlyunderthe heading of how to obscurethe data. The techniquesmentioned were to change the scale in mid-axis,emphasize the trivial, jiggle the baseline, orderthe chartby a characteristic unrelatedto the data, label poorly,and include more dimensions decimalplaces thanare justified or or needed. These methodswillworkseparately in comor binationwith othersto produce graphs and tables of littleuse. Their commoneffect will usuallybe to leave the reader uninformed about the points of interest in the data, althoughsometimestheywill misinform us; the physicians' incomeplot in Figure 13 is a primeexample of misinformation. of Finally,the availability color usually means that there are additionalparametersthat can be misused. The U.S. Census' two-variable color map is a wonderful example of how using color in a graph can seduce us Table 2. OptimalSelection Froma FiniteSequence With Sampling Cost (revised) b/c = 10 b/c = 100 b/c = 1,000 N r* G r* G r* 2 .2 2 2.2 2 2 .3 2 2.9 2 The two-variable color map was done ratherwell by Mayr(1874), 100 yearsbeforethe U.S. Census version. He used bars of varying widthand frequency accomto what the U.S. Census used varying plish gracefully saturations do clumsily. to A particularly enlightening experience is to look the six books of graphsthatWilliam carefully through Playfairpublished duringthe period 1786-1822. One discoversclear, accurate, and data-laden graphs containing manyideas thatare usefuland too rarely applied today. In the course of preparingthis article,I spent to of manyhours lookingat a variety attempts display Earns g~~~ Sha LAnd Per DMndenldxs (Dollars) 1.71 111. 1.21.70 1.63 1.53 S I ,.S L....I < m . Earning .'.. 1~~~.... ',. -:- ... '' .. .....1 D......d. Figure,"'.''. 23'.'''. extra An .. . .::: , :.:.: .: '' :.. ::::::.:;:;: . . :.:. . . . :: . . :. ::::::..: . : ::: . ::. .:::: . .::.: .: :::::: :. .: . .cnueevnt . ~::" ~ '~ ~dimensn... .. . : :"',...: .:::. : :- :, .: .: .. .;;:. ( : 1979,::::The Washington Post)::.:::::: :: : . . :.'. :: .:- . :.. . : '. . ' .::: :::::::: :::::::::;:::~ : .' .: . .::::: ~~~~~~~~~~.,. :.. ' :'.::'... :. .,:.. ':~~~~~~~~~~~. . .. . . . . . . '.,...::~ . . . . . . . . . . . . . . . . .::::: .... . .. . .:::: .... . .:: :: :: .: .. .: .::: .: . : . . :' .: ......... . : .::: ::: .::::::::::::::: .: .: . ; . : .: :: ::.:.. :: .:: .:: . :.. . :. : . ,'' . . . .': . ". ' . .: : :: ::: . .::::::. :: ::: .: . .. . . . . . .. . . . .::.. .' . . :. ':: . . . . . . .. :.' ,. ,. ,.::" '::" ~~~~~~~~~~~~ : gr .. ;::::::: . . . . :.: . .: ::: .: .:::::::: .4. ,. . . . . . . . .'' .:' . ~~ ~~~~~~ .:: ": . .': .,., ,,,., ...,., .::::::'' X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 73 972~~~~~~~~~~~~.7S .77 ... .6 ...... . , .............. .... ... :':' .. ... .. . Eanig 22 4 Rule 12-If It Has Been Done Wellin thePast, Thinkof AnotherWayto Do It l . .Dvded 1 .l G 3 intothinking thatwe are communicating more thanwe are (see Fienberg 1979; Wainer and Francolini1980; Wainer 1981). This leads us to the last rule. 29 5 6 7 8 9 10 2 3 3 3 3 4 .3 .4 .4 .5 .6 .6 3 3 3 4 4 4 3.5 4.2 4.9 5.6 6.3 6.9 3 3 3 4 4 4 NOTE:g(Xs + r- 1) =bR(Xs + r - 1) + a, if = s, andg(Xs + r - 1) S Fiur .t .99 An.h xr 2. . .ieso Wahngo .h .Post).... cofue .grapher .eve .... .. .... ... 36 43 49 56 63 70 0, otherwise. May 1984, Vol. 38, No. 2 ?D The AmericanStatistician, This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions 145 2. 00 Beginning at the left on the Polish-Russian border near the Nieman River,the thickband showsthe size of the army(422,000 men) as it invaded Russia in June 1812. The widthof the band indicatesthe size of the armyat each place on the map. In September,the armyreached Moscow, whichwas by thensacked and from deserted,with100,000men. The path of Napoleon's retreat Moscow is depictedby the darker,lowerband, whichis linkedto a temperature scale and dates at the bottomof the chart.It was a cold winter,and manyfrozeon the marchout of Russia. bitterly As the graphicshows, the crossingof the Berezina River was a struggled back to Poland withonly disaster,and the armyfinally Also shownare themovements auxiliary of 10,000menremaining. troops, as theysought to protectthe rear and flankof the adstory withits vancingarmy.Minard'sgraphictellsa rich,coherent thanjust a singlenumber multivariate data, farmore enlightening bouncingalong over time.Six variablesare plotted:the size of the surface,directionof the army,its location on a two-dimensional on army'smovement,and temperature various dates duringthe retreat fromMoscow. It may well be the best statistical graphicever drawn. 1. 75 EornL n. I . 50 cca -J M 1.25 Di vXdend. 1.00 1972 1976 1974 Y E RR Figure 24. Data fromFigure 23 redrawn simply (fromWainer 1980). 5. SUMMING UP data. Some of the horrorsthat I have presentedwere the fruits thatsearch. In addition,jewels sometimes of emerged. I saved the best for last, and will conclude withone of those jewels-my nominee forthe titleof "World's Champion Graph." It was produced by Minard in 1861 and portrays devastating the losses sufferedby the Frencharmyduringthe course of Napoleon's ill-fatedRussian campaign of 1812. This graph (originallyin color) appears in Figure 25 and is reproducedfrom Tufte'sbook (1983, p. 40). His narrative follows. tended to be Althoughthe tone of thispresentation lightand pointed in the wrong direction,the aim is serious. There are manypathsthatone can followthat willcause deteriorating qualityof our data displays;the 12 rules that we described were only the beginning. Nevertheless, theypointclearlytowardan outlookthat providesmanyhintsforgood display.The measuresof The data density display described are interlocking. withchartjunk; cannotbe highifthe graphis cluttered the data-inkratio growswiththe amount of data dismanifests itselfmost freplayed; perceptualdistortion -1813. de dans la campagne Russ\\e1812 de en CARTE FIGURATIVE des pertes successives hommes l Armnee Franr&ise en Ceneralt de. Ponts et Chaussees retra'cte. Dressce M.Mtnard. par Inspecteur C,~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 0 0t'r'' = 6I)7ollzhtr e~~~~~~~~~ =I Awer G.&'&- j 0 - 2 0 1. Z 8 9 1 " -tie-, - 146 Mnar'sTABLE - - I )cc:iibtr - ofthis0initsoriginalcolor-p.176). Tufte_ 1983_ Fiue 5 for_ q_ superb_ reproduction fte rnc ry' l-ftd oa it inr' (81)gah Tufe183for~ upeb eprducio ofths i is oigialcolr-. 16) Figue 2. gaUho tRAhe Urenh Army'sil-ftued Z~~~~~~~~~~~~~~~*~~~~~~~~*~~~~~~~ 9b~~~~~~~~~~~~~~~. 5$~~~~~~~~~~~~C -- ~~~~~~~~~~~~~~~~~ XIr = , N'~~~ciubcr usi-Acnidt deRedusieoa nt - - _______________________________ r. carmn~rdiae - orte il o Wrl' This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions October CaponGah"(e foa~r ditesouf deWorldsCapo the C The AmericanStatistician, May 1984, Vol. 38, No. 2 10~~~~~~ rp"(e whenadditionaldimensions worthless quently or metaphorsare included.Thus, therulesforgood displayare quite simple. Examine the data carefullyenough to know what theyhave to say, and then let them say it witha minimum adornment. thiswhilefollowing of Do reasonableregularity practices thedepiction scale, in of and label clearlyand fully. Last, and perhapsmostimportant,spend some time looking at the work of the mastersof the craft.An hour spent with Playfairor Minard will not only benefityour graphicalexpertise but will also be enjoyable. Tukey (1977) offers236 graphsand little chartjunk. The workofFrancisWalker (1894) concerning statistical maps is clear and concise, and it is truly mystery a thattheircurrent counterparts do not make betteruse of the schemadeveloped a centuryand more ago. [ReceivedSeptember 1982. RevisedSeptember 1983.] REFERENCES BERTIN, J. (1973), Semiologie Graphique (2nd ed.), The Hague: Mouton-Gautier. (1977), La Graphique et le TraitementGraphique de l'Information, France: Flammarion. (1981), Graphicsand theGraphicalAnalysisof Data, translation, W. Berg, tech. ed., H. Wainer,Berlin: DeGruyter. COX, D.R. (1978), "Some Remarks on the Role in Statisticsof Graphical Methods," Applied Statistics, 4-9. 27, DHARIYAL, I.D., and DUDEWICZ, E.J. (1981), "Optimal Selection From a FiniteSequence WithSamplingCost," Journal the of AmericanStatistical Association,76, 952-959. EHRENBERG, A.S.C. (1977), "Rudimentsof Numeracy,"Journal of theRoyal Statistical Society,Ser. A, 140, 277-297. FIENBERG, S.E. (1979), Graphical Methods in Statistics, The AmericanStatistician, 165-178. 33, FRIEDMAN, J.H., and RAFSKY, L.C. (1981), "Graphics forthe Multivariate Two-SampleProblem,"Journalof theAmericanStatistical Association,76, 277-287. JOINT COMMITTEE ON STANDARDS FOR GRAPHIC PRESENTATION, PRELIMINARY REPORT (1915), Journal of theAmericanStatistical Association, 14, 790-797. MACDONALD-ROSS, M. (1977), "How NumbersAre Shown: A Review of Research on the Presentation QuantitativeData in of Texts," Audiovisual Communications Review,25, 359-409. MAYR, G. VON (1874), "Gutachen Uber die Anwendung Graphder ischenund Geographischen,"Methodin der Statistik, Munich. MINARD, C.J. (1845-1869), Tableaus Graphiqueset CartesFiguratives M. Minard,Bibliothequede l'Ecole Nationale des Pontset de Chaussees, Paris. PLAYFAIR, W. (1786), The Commercialand PoliticalAtlas, London: Corry. SCHMID, C.F. (1954), Handbook of Graphic Presentation, New York: Ronald Press. SCHMID, C.F., and SCHMID, S.E. (1979), Handbook of Graphic Presentation (2nd ed.), New York: JohnWiley. TUFTE, E.R. (1977), "Improving Data Display," University Chiof cago, Dept. of Statistics. (1983), The Visual Display of QuantitativeInformation, Cheshire,Conn.: Graphics Press. TUKEY, J.W. (1977), Exploratory Data Analysis,Reading, Mass: Addison-Wesley. WAINER, H. (1980), "Making Newspaper Graphs Fit to Prinit," in Processingof Visible Language, Vol. 2, eds. H. Bouma, P.A. Kolers, and M.E. Wrolsted,New York: Plenum, 125-142. , "Reply" to Meyerand Abt (1981), TheAmericanStatistician, 57. (1983), "How Are We Doing? A Review of Social Indicators III," Journalof theAmericanStatistical Association,78, 492-496. WAINER, H., and FRANCOLINI, C. (1980), "An Empirical Inof quiryConcerningHuman Understanding Two-VariableColor Maps," The AmericanStatistician, 81-93. 34, WAINER, H., and THISSEN, D. (1981), "Graphical Data Analysis,"Annual Reviewof Psychology, 191-241. 32, WALKER, F.A. (1894), Statistical Atlas of theUnited StatesBased on theResults theNinthCensus,Washington, of D.C.: U.S. Bureau of the Census. ?) The AmericanStatistician, May 1984, Vol. 38, No. 2 This content downloaded from 54.84.104.155 on Wed, 07 Oct 2015 13:30:22 UTC All use subject to JSTOR Terms and Conditions 147

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Elementary Statisitcs

Authors: Barry Monk

2nd edition

1259345297, 978-0077836351, 77836359, 978-1259295911, 1259295915, 978-1259292484, 1259292487, 978-1259345296

More Books

Students also viewed these Mathematics questions