[DATA AND ANALYTICS] How 'Big Data' Is Different These days, lots of people in business are talking about "big data." But how do the potential insights from big data differ from what managers generate from traditional analytics? BY THOMAS H. DAVENPORT, PAUL BARTH AND RANDY BEAN These days, many people in the information technology world and in corporate boardrooms are talking about "big data." Many believe that, for companies that get it right, big data will be able to unleash new organizational capabilities and value. But what does the term "big data" actually entail, and how will the insights it yields differ from what managers might generate from traditional analytics? There is no question that organizations are swimming in an expanding sea of data that is either too voluminous or too unstructured to be managed and analyzed through traditional means. Among its burgeoning sources are the clickstream data from the Web, social media content (tweets. blogs, Facebook wall postings, etc.) and video data from retail and other settings and froan video entertainment. But big data also encompasses everything from call cen- ronments at a more granular level, to 1. Paying attention to flows ter voice data to genomic and proteomic create new products and services, and to as opposed to stocks There are data from biological research and medicine. respond to changes in usage patterns as several types of big data applications. The Every day, Google alone processes about 24 they occur. In the life sciences, such capa- first type supports customer-facing propetabytes (or 24,000 terabytes) of data. Yet bilities may pave the way to treatments and cesses to do things like identify fraud in verylitte of the information is formatted in cures for threatening diseases. real time or score medical patients for the traditional rows and columns of con- Organizations that capitaline on big health risk. A second type involves continventional databases. data stand apart from traditional data wous process monitoring to detect such Many II vendors and solutions provid- analysis environments in three key ways things as changes in consumer sentiment ers use the term "big data" as a buzzword . They pay attention to data flows as op-" or the need for service on a jet engine. Yet for smarter, more insightful data analysis. posed to stocks. another type uses big data to explore netBut big data is really much more than that. -They rely on data scientists and prodect work relationships like suggested friends Indeed, companies that learn to take and process devclopers rather than data on Linkedin and Facebook. In all these apadvantage of big data will tuse real-time analysts. plications, the data is not the "stock" in a information from sensors, radio frequency .They are moving analytics away from the data warehouse but a continuous flow. identification and other identifying I function and into core business, opera- This represents a substantial change from devices to understand their business envi- tional and production functions. the past, when data analysts performed multiple analyses to find meaning in a milliseconds, then optimire the offers over the data itself - obtaining, extracting, mafixed supply of data. time by tracking responies. nipulating and structuring it - is critical to 'Today, rather than looking at data to as-_ Some big data emironments, such as any analyaik, the people who work with big Ners what occurred in the past, organirations consumer sentiment analynis, are not de- data need substantial ard creative IT shille. need to think in terms of continuous flows. signed for automating decisions but are They also need to be dose to products and and processes. "Streaming analytics allows better waited for real-time monitoring of the proceiserwithin organizations, which means you to process data during an event to im- ervironment. Given the volume and veloc- they need to be organiaed differently than prove the outcome," notes Tom Deutsch, ity of big data, conventional, high icertitude analytical staff were in the pust. program director for big data technologies approaches to decision-making are ofien "Data scientists," as these professionals and applied analyticsat IBM. This capability not appeopriate in such settings by the time are known, understand analytics, but they is becoming increasingly important in fidds the organiration has the information it also are well versed in IT. often having adsuch as health care.At Toronto's Hospital for needs to make a decision, new data is often vanced degrees in computer science. Sidk Children, for example, machine learn- available that renders the decision obsolete. computational physics or biolegy-or neting algorithms are able to divcover patterns In real-time monitoring conterts,organiza- work-oriented social sciences. Their that anticipate infections in premature ba- tions need to adopt a more continuous upgradeddata managrment skill set - inbies before theyoccur. approach to andysis and decision-making cluding programming, mathematical and The increased volume and velocity of based on a series of hunchesand hypotheses. statistical shills, as well as business acumen data in production settings means that of- Social media analytics, for example, capture and the ability to communicate effectively ganizations will need to develop continuous fast-breaking trends on customer senti- with decision-makers-goes well beyond processes for gathering, analyzing and in- ments about products, brands and what was necessary for data analysts in the terpreting data. The insights from these. companies. Although compunies might be past. This combination of skills, valuable efforts can be linked with production appli- interested in knowing whether an hour's or as it is, is in very short supply. cations and processes to enable continuous adayschangesin online sentiment correlate As a result, some carly adopters of big processing. Although amall "stocks" of data with sales changes, by the time a traditional data are working to develop their own talent. located in warchouses or data marts may analyais is completed there would be a raft of EMC Corporation, for ecample, traditioncontinue to be useful for developing and re- new data to analyze. Therefore, in big data ally a provider of data storage technologies, fining the analytical models used on big emvironments it's important to analyee, de- acquired Greenplum, a big data technology data, once the models have been developed, cide, and act quickly and often. company, in 2010 to expund its capabilities they need to process continuing data However, it isn't enough to be able to in data science and promptly started an edustreams quickly and accurately. monitor a continuing stream of informa- cational offering for data scientists. Other The behavior of credit card companies tion. You alwo hrve to be prepared to make companies are working with universities to offers a good illustration of this dynamic. In decisions and take action. Organizations train data scientists. the past, direct marketing groups at credit need to establish processes for determin- Early users of big data are also rethinkcard companiescreated models to select the ing when specific decisions and actions are ing their organizational stractures for most likely customer prospects from a large necesary - when, for example, data val- data scientists. Traditionally, analytical data warehouse. The process of data extrac- ues fall outside certain limits. This helps to professionals were often part of internal tion. preparation andanalysis took weeks to determine decision stakeholderk, decision consulting organizations advising manprepare - and weeks more to execute. processes and the criteria and timeframes agers or executives os internal decisions. However, credit card companies, frustrated for which decisions need to be made. However, in some industries, such as by their inability to act quickly, determined online social networks, gaming and pharthat there was a much faster way to meet 2. Relying on data scientists maceuticals, data scientists arepart of the most of their requirements. In fact, they and product and process de- product development organization, were able to create a"ready-to-market" da- velopers as opposed to data developing new products and product tabase and system that allows a marketer to analysts Alhough there has ahwaysbeen features. At Merck, for example, data analyze, select and issue offers in a single a need for analytical poofesionals to support scientists fwhome the company calls staday. Through frequent iterations and moni- the organization's analytical capubilities, the tistical genetics scientists) are members toring of website and call-center activities, requirements for support personnd are dif- of the drug discovery and development companies can make personalized offers in ferent withbigdata. Because interacting with organization. 3. Moving analytics from IT reconfigured for different needs. Cloud- This requires a sea change in IT activity into core business and oper- based service providers offer on-demand within organizations, As the volume of data. ational functions Surging volumes pricing with fast reconfiguration. explodes, organizations will need analytic of data require major improvements in da- Another approach to managing big data is tools that are reliable, robust and capable of tabase and analytics technologies. Jeaving the data where it is. So-called "virtual being automated. At the same time, the anaCapturing, filtering, storing and analyzing data marts" allow datascientists toshare exist- Iytics, algorithms and user interfaces they big data flows can swamp traditional ing data without replicating it. eBay, for employ will need to facilitate interactions networks, storage arrays and relational da- example, used to have an enormous data rep- with the people who work with the took. tabase platforms. Attempts to replicate and lication problem, with between 20- and Successful II organizations will train and scale the existing technologies will not 50 -fold versions of the same data scattered recruit people with a new set of skills who keep up with big data demands, and big throughout its various data marts. Now, can integrate these new analytic capabilities data is changing the technology, skills and thanks to its virtual data marts, the company's into their production environments. processes of the IT function. replication problem has been dramatically A further way that big data disrupts the The market has responded with a broad reduced. eBay has also established a "data traditional roles of business and IT is that array of new products designed to deal with hub"- an internal website to make it easier it presents discovery and analysis as the big data. They include open source plat- for managers and analysts to serve them- first order of business. Next-generation IT processes and systems need to be designed Coming to terms with big data is prompting__ for insight, not just automation. Tradiorganizations to rethink their hasic ssional it architecture is accustomed to arganizatiOMS tO rethink their DaSIC GSSUMPPEMS having applications (or services) as "black about the relationship between business and boxes" that perform tasks without exposIT - and their respective roles. ing internal data and procedures. But big data environments must make sense of forms such as Hadoop, invented by Internet selves and share data and analyses across the new data, and summary reporting is not pioneers to support the massive scale of organization. In effect, eBay has built asocial enough. This means that IT' applications data they generate and manage. Hadoop al- network around analytics and data. need to measure and report transparently lows organizations to load, store and query Coming to terms with big dataisprompt- on a wide variety of dimensions, including massive data sets on a large grid of inexpen- ing organizations to rethink their basic customer interactions, product usage, sersive servers, as well as execute advanced assumptions about the relationship between vice actions and other dynamic measures. analytics in parallel. Relational databases business and IT - and their respective roles. As big data evolves, the architecture will have also been transformed: New products The traditional role of IT- automating develop into an information ecosystem: a have increased query performance by a fac- business processes - imposes precise re- network of internal and external services tor of 1,000 and are capable of managing quirements, adherence to standards and continuously sharing information, optithe wide variety of big data sources. Statisti- controls on changes. Analytics has been mixing decisions, communicating results cal analysis packages are similarly evolving more of an afterthought for monitoring pro- and generating new insights for businesses. to work with these new data platforms, data cesses and notifying management about the types and algorithms. anomalies. Big data flips this approach on its Thomas H. Davenport is a visiting professor Another disruptive force is the delivery head. A key tenet of big data is that the world at Harvard Business School and President's of big data capabilities through "the cloud." and the data that describe it are constantly Technology and Management at Babson. Although not yet broadly adopted in large changing, and organizations that can recog - College in Wellesley, Massachuretts. Paul corporations, cloud-based computing is nize the changes and react quickly and Barth and Randy Bean are the cofounders well suited to big data. Many big-data ap- intelligently will have the upper hand. and managing partners of NewVantage plications use external information that is Whereas the most vaunted business and IT consulting firm. Comment on this article at not proprietary, such as social network capabilities used to be stability and scale, the hrtpolsloanreview.mit.edula 54 to4, or conmodeling and sentiment analysis. More- new advantages are based on discovery and over, big data analytics are dependent on agility - the ability to mine existing and new power, requiring a flexible grid that can be events and opportunities. 2012. All rights nosmaht. You will usually follow the following sequence in actually writing your report, but note that the abstract (if you include it) will come first among these elements in the final report: Abstract Introduction Materials \& Methods Results Discuss ion