Summarisation
How 'Big Data' is Different These days, lots of people in business are talking about "big data." But how do the potential insights from big data differ from what managers generate from traditional analytics? BY THOMAS H. DAVENPORT, PAUL BARTH AND RANDY BEAN These days, many people in the information technology world and in corporate boardrooms are talking about "big data." Many believe that, for companies that get it right, big data will be able to unleash new organizational capabilities and value. But what does the term "big data" actually entail, and how will the insights it yields differ from what managers might generate from traditional analytics? There is no question that organizations are swimming in an expanding sea of data that is either too voluminous or too unstructured to be managed and analyzed through traditional means. Among its burgeoning sources are the clickstream data from the Web, social media content (tweets, blogs, Facebook wall postings, etc.) and video data from retail and other settings and from video entertainment. But big data also encompasses everything from call cen- ronments at a more granular level, to 1. Paying attention to flows ter voice data to genomic and proteomic create new products and services, and to as opposed to stocks There are data from biological research and medicine. respond to changes in usage patterns as several types of big data applications. The Every day, Google alone processes about 24 they occur. In the life sciences, such capa- first type supports customer-facing propetabytes (or 24,000 terabytes) of data. Yet bilities may pave the way to treatmentsand cesses to do things like identify fraud in very little of the information is formatted in cures for threatening diseases. real time or score medical patients for the traditional rows and columns of con- Organizations that capitalize on big health risk. A second type involves continventional databases. data stand apart from traditional data nous process monitoring to detect such Many IT vendors and solutions provid. analysis environments in three key ways: things as changes in consumer sentiment ers use the term "big data" as a buzzword. They pay attention to data flows as op- or the need for service on a jet engine. Yet for smarter, more insightful data analysis. posed to stocks. another type uses big data to explore netBut big data is really much more than that. They rely on data scientists and product work relationships like suggested friends Indeed, companies that learn to take and process developers rather than data on Linkedin and Facebook. In all these apadvantage of big data will use real-time analysts. plications, the data is not the "stock" in a information from sensors, radio frequency. -They are moving analytics ancay from the data warehouse but a continuous flow. identification and other identifying IT function and into core business, opera-. This represents a substantial change from devices to understand their business envi- tional and production functions. the past, when data analysts performed multiple analyses to find meaning in a milliseconds, then optimize the offers over the data itself - obtaining, extracting, mafixed supply of data. Today, rather than looking at data to as. Some big data environments, such as any analysis, the people who work with big sess what occurred in the past, organizations consumer sentiment analysis, are not de- data need substantial and creative II skails. need to think in terms of continuous flows signed for automating decisions but are They also need to be close to products and and processes. "Streaming analytics allows better suited for teal-time monitoring of the procceserwithin organizations, which means you to process data during an event to im- environment. Given the volume and veloc- they need to be organized differently than prove the outcome," notes Tom Deutsch, ity of big data, comventional, high-certitude analytical staff were in the part. program director for big data technologies approaches to decision-making are often "Data scientists," as these professionals and applied analytics at IBM. This capability not appropriate in such settings, by the time are known, understand analytics, but they is becoming increasingly important in fields the organization has the information it also are well versed in IT, often having adsuch as health care. At Toronto's Hospital for needs to make a decision, new data is often vanced degrees in computer science, Sick Children, for example, machine learn- available that renders the decision obsolete. computational physics or biology-ot neting algorithms are able to discower patterns In real-time monitoring contexts, organiza- work-oriented social sciences. Their that anticipate infections in premature ba- tions need to adopt a more continuous upgraded data management skill set - inbies before they occur. approach to analysis and decision-making cluding programming, mathematical and The increased volume and velocity of based on a series of hunches and hypotheses statistical skills as well as besiness acumen data in production settings means that or- Social media analytics, for example, capture and the abality to communicate effectively ganizations will need to develop continuous fast-breaking trends on customer senti- with decision-makers - goes well beyond processes for gathering, analyzing and in- ments about products, brands and what was necessary for data analysts in the terpreting data. The insights from these companies. Although companies might be past. This combination of skills, valuable efforts can be linked with production appli- interested in knowing whether an hour's or as it is, is in very short supply. cations and processes to enable continuous a day's changes in online sentiment correlate As a result, some carly adopters of big processing. Although small "stocks" of data with sales changes, by the time a traditional data are working to develop their own talent. located in warehouses or data marts may analysis is completed therewould be a raft of EMC Corporation, for example, traditioncontinue to be useful for developing and re- new data to analyze. Therefore, in big data ally a provider of data storage tecinologies, fining the analytical models used on big environments it's important to analyze, de- acquired Greenplum, a big data technology data, once the models have been developed, cide, and act quiclly and often. company, in 2010 to expand its capabilities they need to process continuing data. However, it isn't enough to be able to in datascience and promptly started an edustreams quickly and accurately. monitor a continuing stream of informa- cational offering for data scientists. Other The behavior of credit card companies tion. You also have to be prepared to make companies are working with universities to offers a good illustration of this dynamic. In decisions and take action. Organirations train datascientists. the past, direct marketing groups at credit. need to establish processes for determin- Early users of big data are also rethinkcard companies created models to select the ing when specific decisions and actions are ing their organizational structures for most likely customer prospects from a large necessary - when, for example, data val- datt scientists. Traditionally, analytical data warehouse. The process of data extrac- ues fall outside certain limits. Thir helps to professionals were often part of internal tion, preparation and analysis took wecks to determine decision stakeholders, decision conrulting organizations advising manprepare - and weeks more to execute. processes and the criteria and timeframes agers or executives on intemal decisions. However, credit card companies, frustrated for which decisionsneed to be made. However, in some industries, such as by their inability to act quielly, determined online social networks, gaming and pharthat there was a much faster way to meet 2. Relying on data scientists maceuticals, data scientists are pairt of the most of their requirements. In fact, they and product and process de- product development organization. wereable to create a "ready-to-market" da- velopers as opposed to data developing new products and product tabase and syatem that allows a marketer to analysts Although therehas alwaysbeen fearures. At Merck, for example, data analyze, select and issue offers in a single a need for analytical professionals to support icientists (whom the company calls staday. Through frequent iterations and moni- the organizationis analytical capabilities, the tiatical genetics acientists) are members toring of website and call-center activitics, requirements for support personnd are dif. of the drug discovery and development companies can make personalized offers in ferent withbigdata. Becanseinteracting with organiration. 3. Moving analytics from IT reconfigured for different needs. Cloud . This requires a sea change in IT activity. into core business and oper- based service providers offer on-demand within organirations. As the volume of data ational functions Surging volumes pricing with fast reconfiguration. explodes, organizations will need analytic of data require major improvements in da- Another approach to managing big data is tools that are reliable, robust and capable of tabase and analytics technologies. Keaving the data where it is. So-callod" virtual being automated At the sametime, the anaCaptaring, filtering, storing and analyzing data marts "allow data scientiststo share exist- lytics, algarithms and user interfaces they: big data flows can swamp traditional ing data without replicating it. eBay, for employ will need to facilitate interactions networks, storage arrays and relational da- eample, used to have an enormous data rep- with the people who work with the tools. tabaseplatformsiAttempts to teplicate and lication problem, with between 20 - and Successful IT organizations will train and scale the existing technologies will not 50 -fold versions of the same data seattered recruit people with a new set of skills who keep up with big data demands, and big throughout its various data marts. Now, can integrate these new analytic capabilities data is changing the technology, skills and thanks to its virtualdata marts the company's into their production environments. processes of the If function. replication problem has been dramatically A further way that big data disrupts the The market has responded with a broad reduced, eBay has also established a "data traditional roles of business and IT is that array of new products designed to deal with hub" - an internal website to make it easier it presents discovery and analysis as the big data. They include open source plat- for managers and analysts to serve them- first order of business. Next-generation IT processes and systems need to be designed Coming to terms with big data is prompting insight, not just automation. TradiComing tional IT architecture is accustomed to organizations to rethink their basic assumptions having applications (or services) as "black about the relationship between business and boxes" that perforru tasks without exposdata environments must-make sense of forms such as Hadoop, invented by Internet selves and share data and analyses across the new data, and summary reporting is not pioneers to support the massive scale of organization. In effect, eBlayhasbeilt a social enough. This means that IT applications data they generate and manage. Hadoop al- network around analytics and data. need to measure and report transparently lows organizations to load, store and query Coming to termswith big data is prompt- on a wide variety of dimensions, including massive data sets on a large grid of inexpen- ing organizations to rethink their basic customer interactions, product usage, sersive servers, as well as execute advanced assumptions about the relationshp between vice actions and other dynamic measures. analytics in parallel. Relational databases business and - and their respective roles. As big data evolves, the architecture will have also been transformed. New products The traditional role of IT-automating develop into an information ecosystem: a have increased query performance by afac- business processes - imposes precise re- network of internal and external services tor of 1,000 and are capable of managing quirements, adherence to standards and continuously sharing information, optithe wide variety of big data sources. Statisti- controls on changes. Analytics has been mizing decisions, communicating results cal analysis packages are similarly evolving more of an afterthought for monitoring pro- and generating new insights for businesses. to work with these new data platforms, data cesses and notifying management about the types and algorithms. anomalies. Big data flips this approach on its Thomas H. Davenport is a visiting professor Another disruptive force is the delivery head. A loy tenet of big data is that the world at Harvard Business Sehool and President's of big data capabilities through "the cloud." And the data that describe it are constantly - Distinguished Professor of information Although not yet broadly adopted in lachnology and Management at Babson corporations, cloud-based computing is - Barth and Randy Bean are the cofounders and managing partners of NewVantage well suited to big data. Many big-data ap- intelligently will have the upper hand. Partners, a Boston-based management plications use external information that is Whereas the most vaunted business and IT consulting firm. Comment on this article at not proprietary, such as social network capabilities used to be stability and scale, the hatpolsfoanreview:mit.eduxib4104, or contact the authors at sminfeedbacksmit edu. More- new advaniages are based on discovery and over, big data analytics are dependent on agility - the ability to mineeristing and new Reprint 54104. extensive storage capacity and processing data sources continuously for patterns, Copyright Masachasetss inatitute of Thohalhoi power, requiring a flecible grid that can be eventsand opportunities. 2012 All rights reservat. Abstract Introduction Materials \& Methods Results Discuss ion