Scientific Report ......
Introduction
Methods and Materials
Results
Discussion
How 'Big Data' Is Different These days, lots of people in business are talking about "big data." But how do the potential insights from big data differ from what managers generate from traditional analytics? BYTHOMAS H. DAVENPORT, PAUL BARTH AND RANDY BEAN These days, many people in the information technology world and in corporate boardrooms are talking about "big data." Many believe that, for companies that get it right, big data will be able to unleash new organizational capabilities and value. But what does the term "big data" actually entail, and how will the insights it yields differ from what managers might generate from traditional analytics? There is no question that organizations are swimming in an expanding sea of data that is either too voluminous or too unstructured to be managed and analyzed through traditional means. Among its burgeoning sources are the clickstream data from the Web, social media content (tweets, blogs, Facebook wall postings, etc.) and video data from retail and other settings and from video entertainment. But big data also encompasses everything from call cen- ronments at a more granular level, to 1. Paying attention to flows ter voice data to genomic and proteomic create new products and services, and to as opposed to stocks There are data from biological research and medicine. respond to changes in usage patterns as several types of big data applications. The Every day, Google alone processes about 24 they occur. In the life sciences, such capa- first type supports customer-facing propetabytes (or 24,000 terabytes) of data. Yet bilities may pave the way to treatments and cesses to do things like identify fraud in very little of the information is formatted in cures for threatening diseases. real time or score medical patients for the traditional rows and columns of con- Organizations that capitalize on big health risk. A second type involves continventional databases. data stand apart from traditional data nous process monitoring to detect such Many IT vendors and solutions provid- analysis environments in threekey ways: things as changes in consumer sentiment ers use the term "big data" as a buzzword. -They pay attention to data flows as op- or the need for service on a jet engine. Yet for smarter, more insightful data analysic. posed to stods. another type uses big data to explore netBut big data is really much more than that. -They rely on data scientists and product work relationships like suggested friends Indeed, companies that learn to take and process developers rather than data on Linkedin and Facebook. In all these apadvantage of big data will use real-time analysts. plications, the data is not the "stock" in a information from sensors, radio frequency. -They are moving analytics away from the data watehouse but a continuous flow. identification and other identifying IT function and into core business, opera-. This represents a substantial change from devices to understand their business envi- tional and production functions._ the past, when data analysts performed multiple analyses to find meaning in a miliseconds, then optimize the offers over the data itself - obtaining extracting, mafixed supply of data. Today, rather than looking at data to as. Some big data environments, such as any analysis, the people who work with big sess what occurred in the past, organizations consumer sentiment analysis, are not de- data need sabstantial and creative IT skills need to think in terms of continuous flows signed for automating decisions but are They aloo need to be close to products and and processes. "Streaming analytics allows better suited for teal-time monitoring of the proccese within organizations, which means you to process data during an event to im- environment. Given the volume and veloc- they need to be organized differently than prove the outcome," notes Tom Deutsch, ity of big data, conventional, high-certitude analytical staff were in the past. program director for big data technologies approaches to decision-making are often "Data scientists," as these professionals and applied analytics at IBM. This capability not appropriate in such settings by the time are known, understand analytics, but they is becoming increasingly important in fields the organization has the information it also are well versed in IT, often having adsuch as health care. At Toronto's Hospital for needs to make a decision, new data is often vanced degrees in computer science, Sick Children, for example, machine learn- available that renders the decision obsolete. computational physics or biology-or neting algorithms are able to discover patterns In real-time monitoring contexts, organiza- work-oriented social sciences. Their that anticipate infections in premature ba- tions need to adopt a more continuous upgraded data management skill set - in bies before they occur. approach to analysis and decision-making cluding programming, mathematical and The increased volume and velocity of based on a series of hunches and hypotheses. statistical skills as well as business acumen data in production settings means that or- Social media analytics, for example, capture and the ability to communicate effectively ganizations will need to develop continuous fast-breaking trends on customer senti- with decision-makers - goes well beyond processes for gathering, analyzing and in- ments about products, brands and what was necessary for data analysts in the terpreting data. The insights from these companies. Although companies might be past. This combination of skills, valuable efforts can be linked with production appli- interested in knowing whether an hour's or as it is, is in very ahort supply. cations and processes to enable continuous a day's changes in online sentiment correlate As a result, some carly adopters of big processing. Although small "stocks" of data with sales changes, by the time a traditional data are working to develop their own talent. located in warehouses or data marts may analysis is completed there would be a raft of EMC Corporation, for example, traditioncontinue to be useful for developing and re- new data to analyze. Therefore, in big data ally a provider of data storage technologies, fining the analytical models used on big environmentsit'simportant to analyze, de- acquired Greenplum, a big data technology data, once the models have been developed, cide, and act quiclly and often. conyany, in 2010 to expand its capabilities they need to process continuing data. However, it isn't enough to be able to in datascience and promply started an edustreams quickly and accurately. monitor a continuing stream of informa- cational offering for data scientists. Other The behavior of credit card companies tion. You also have to be prepared to make companies are working with universities to offers a good illustration of this dynamic. In decisions and take action. Organizations train datascientistc the past, direct marketing groups at credit need to establish processes for determin- Early users of big data are also rethinkcard companies created models to select the ing when specific decisjons and actions are ing their organizational structures for most likely customer prospects from a large- necessary - when, for example, data val- data scientists. Traditionally, analytical data warehouse. The process of data extrac- ues fall outside certain limits. This helps to professionals were often part of internal tion, preparation and analysis took wecks to determine decision stakeholders, decision consulting organizations advising manprepare - and weeks more to execute. processes and the criteria and timeframes agers or executives on internal decisions. However, credit card companies, frustrated for which decisions need to be made. However, in some industries, such as by their inability to act quickly, determined online social networks, gaming and pharthat there was a much faster way to meet 2. Relying on data scientists maceuticals, data scientiats are part of the most of their requirements. In fact, they and product and process de- product development organization. were able to create a "ready-to-market" da- velopers as opposed to data developing new products and product tabase and system that allows a marloter to analysts Althongh therehas alwaysbeen features. At Merck, for example, data analyze, select and issue offers in a single a a need for analytical professionals to support scientists (whom the company calls staday. Through frequent iterations and moni- the organization's analytical capabalities, the tistical genetics scientists) are members toring of website and call-center activities, requirements for rupport personnel are dif- of the drug discovery and development companies can make personalized offers in -ferentwithbigdata. Becanseinteractingwith. organiration. 3. Moving analytics from IT reconfigured for different needs. Cloud: This requires a sca change in IT activity into core business and oper- based service providers offer on-demand within organirations. As the volume of data ational functions Surging volumes pricing with fast reconfiguration. explodes, onganizations will need analytic of data require major improvements in da- Another approach to managing big data is tools that are reliable, robust and capable of tabase and analytics technologies. Keaving the data where itis. So-callod"virtual being automated. At thesametime, the anaCapturing, filtering, storing and analyzing data marts "allowdata scientists to share exist- lytics, algarithms and user interfaces they big data flows can swamp traditional ing data without replicating it, eBay, for employ will need to facilitate interactions networks, storage arrays and relational da- eample, used to have an enormous data rep. with the people who work with the tools. tabaseplatforms.Attempts to replicate and lication problem, with between 20 - and Successful IT organizations will train and scale the existing technologies will not 50 -fold versions of the same data scattered recruit people with a new set of skills who keep up with big data demands, and big throughout its various data marts. Now, can integrate these new analytic capabilities data is changing the technology, skills and thanks to its virtual data marts the company's into their production environments. processes of the IT function. replication problem has been dramatically A further way that big data disrupts the The market has responded with a broad reduced. eBay has also established a "data traditional roles of business and IT is that array of new products designed to deal with hub - an internal website to make it easier it presents discovery and analysis as the big data. They include open source plat- for managers and analysts to serve them- first order of business. Next-generation IT processes and systems need to be designed Coming to terms with big data is prompting for insight, not just automation. Tradiorganizations to rethink their basic assumptions having applications (or services) as"black about the relationship between business and boxes" that perforry tasks without exposIT and their respective roles. in internal data and procedures. But big data environments must-make sense of forms such as Hadoop, invented by Internet. selves and share data and analyses across the new data, and summary reporting is not pioneers to support the massive scale of organization. In effect, ellayhasbuilta social enough. This means that IT applications data they generate and manage. Hadoop al- network around analytics and data. need to measure and report transparently lows organizations to load, store and query Coming to terms with big data is prompt- on a wide variety of dimensions, including massive data sets on a large grid of inexpen- ing organizations to rethink their basic customer interactions, product usage, sersive servers, as well as execute advanced assumptions about the relationship between vice actions and other dynamic measures. analytics in parallel. Relational databases business and - and their respective roles. As big data evolves, the architecture will have also been transformed:New products The traditional role of IT- automating develop into an information ecosystem: a have increased query performanceby a fac- business processes - imposes precise re- network of internal and external services tor of 1,000 and are capable of managing quirements, adherence to standards and continuously sharing information, optithe wide variety of big data sources. Statisti- controls on changes. Analytics has been mizing decisions, communicating results cal analysis packages are similarly evolving more of an afterthought for monitoring pro- and generating new insights for businesses. to work with these new data platforms, data. cesses and notifying management about the types and algorithms. anomalies. Big data fips this approach on its Thomas H. Davenport is a visiting professor Another disruptive force is the delivery head A locy tenet of big data is that the world at Harvard Business School and Presidenf's of big data capabilities through "the cloud." and the data that describe it are constantly Distinguished Professor of information Although not yet broadly adopted in large changing, and organizations that can recog- College in Weliesloy, Massachuseits. Paul corporations, cloud-based computing is nize the changes and react quickly and Barth and Randy Bean aro the cofounders well suited to big data. Many big-data ap- intelligently will have the upper hand - and managing partners of NewVantege plications use external information that is Whereas the most vaunted business and IT consulting firm. Comment on this article at not proprietary, such as social network capabilities used to be stability and scale, the hitpol/s/oanreviewimit. edubi54104, or conmodeling and sentiment analysis. More- new advantages are based on discovery and over, big data analyties are dependent on agility - the ability to mine existing and new Reprint 54104. extensive storage capacity and processing data sources continuously for patterns, Copyright 9 Manachucts intitute of Tratnolegi power, requiring a flexible grid that can be events and opportunitics. 2012 All rights neservet