Question

1 Approved Answer

Posted on Sep 30, 2024

According to the authors of Business Intelligence and Analytics: From Big Data to Big Impact, what the emerging analytics research opportunities are? Please focus on

According to the authors of "Business Intelligence and Analytics: From Big Data to Big Impact", what the emerging analytics research opportunities are? Please focus on one opportunity to discuss.

Data analytics refers to the BI&A technologies that are grounded mostly in data mining and statistical analysis. As mentioned previously, most of these techniques rely on the mature commercial technologies of relational DBMS, data warehousing, ETL, OLAP, and BPM (Chaudhuri et al. 2011). Since the late 1980s, various data mining algorithms have been developed by researchers from the artificial intelligence, algorithm, and database communities. In the IEEE 2006 International Conference on Data Mining (ICDM), the 10 most influential data mining algorithms were identified based on expert nominations, citation counts, and a community survey. In ranked order, they are C4.5, k-means, SVM (support vector machine), Apriori, EM (expectation maximization), PageRank, AdaBoost, kNN (k-nearest neighbors), Nave Bayes, and CART (Wu et al. 2007). These algorithms cover classification, clustering, regression, association analysis, and network analysis. Most of these popular data mining algorithms have been incorporated in commercial and open source data mining systems (Witten et al. 2011). Other 1174 MIS Quarterly Vol. 36 No. 4/December 2012 Chen et al./Introduction: Business Intelligence Research advances such as neural networks for classification/prediction and clustering and genetic algorithms for optimization and machine learning have all contributed to the success of data mining in different applications. Two other data analytics approaches commonly taught in business school are also critical for BI&A. Grounded in statistical theories and models, multivariate statistical analysis covers analytical techniques such as regression, factor analysis, clustering, and discriminant analysis that have been used successfully in various business applications.

Developed in the management science community, optimization techniques and heuristic search are also suitable for selected BI&A problems such as database feature selection and web crawling/ spidering. Most of these techniques can be found in business school curricula. Due to the success achieved collectively by the data mining and statistical analysis community, data analytics continues to be an active area of research. Statistical machine learning, often based on well-grounded mathematical models and powerful algorithms, techniques such as Bayesian networks, Hidden Markov models, support vector machine, reinforcement learning, and ensemble models, have been applied to data, text, and web analytics applications. Other new data analytics techniques explore and leverage unique data characteristics, from sequential/temporal mining and spatial mining, to data mining for high-speed data streams and sensor data. Increased privacy concerns in various e-commerce, egovernment, and healthcare applications have caused privacypreserving data mining to become an emerging area of research. Many of these methods are data-driven, relying on various anonymization techniques, while others are processdriven, defining how data can be accessed and used (Gelfand 2011/ 2012).

Over the past decade, process mining has also emerged as a new research field that focuses on the analysis of processes using event data. Process mining has become possible due to the availability of event logs in various industries (e.g., healthcare, supply chains) and new process discovery and conformance checking techniques (van der Aalst 2012). Furthermore, network data and web content have helped generate exciting research in network analytics and web analytics, which are presented below. In addition to active academic research on data analytics, industry research and development has also generated much excitement, especially with respect to big data analytics for semi-structured content. Unlike the structured data that can be handled repeatedly through a RDBMS, semi-structured data may call for ad hoc and one-time extraction, parsing, processing, indexing, and analytics in a scalable and distributed MapReduce or Hadoop environment. MapReduce has been hailed as a revolutionary new platform for largescale, massively parallel data access (Patterson 2008). Inspired in part by MapReduce, Hadoop provides a Javabased software framework for distributed processing of dataintensive transformation and analytics. The top three commercial database suppliersOracle, IBM, and Microsoft have all adopted Hadoop, some within a cloud infrastructure. The open source Apache Hadoop has also gained significant traction for business analytics, including Chukwa for data collection, HBase for distributed data storage, Hive for data summarization and ad hoc querying, and Mahout for data mining (Henschen 2011). In their perspective paper, Stonebraker et al. (2010) compared MapReduce with the parallel DBMS. The commercial parallel DBMS showed clear advantages in efficient query processing and high-level query language and interface, whereas MapReduce excelled in ETL and analytics for "read only" semi-structured data sets. New Hadoop- and MapReduce-based systems have become another viable option for big data analytics in addition to the commercial systems developed for RDBMS, column-based DBMS, in-memory DBMS, and parallel DBMS (Chaudhuri et al. 2011). Text Analytics A significant portion of the unstructured content collected by an organization is in textual format, from e-mail communication and corporate documents to web pages and social media content.

Text analytics has its academic roots in information retrieval and computational linguistics. In information retrieval, document representation and query processing are the foundations for developing the vector-space model, Boolean retrieval model, and probabilistic retrieval model, which in turn, became the basis for the modern digital libraries, search engines, and enterprise search systems (Salton 1989). In computational linguistics, statistical natural language processing (NLP) techniques for lexical acquisition, word sense disambiguation, part-of-speech-tagging (POST), and probabilistic context-free grammars have also become important for representing text (Manning and Schtze 1999). In addition to document and query representations, user models and relevance feedback are also important in enhancing search performance. Since the early 1990s, search engines have evolved into mature commercial systems, consisting of fast, distributed crawling; efficient inverted indexing; inlink-based page ranking; and search logs analytics. Many of these foundational text processing and indexing techniques have been deployed in text-based enterprise search and document management systems in BI&A 1.0. MIS Quarterly Vol. 36 No. 4/December 2012 1175 Chen et al./Introduction: Business Intelligence Research Leveraging the power of big data (for training) and statistical NLP (for building language models), text analytics techniques have been actively pursued in several emerging areas, including information extraction, topic models, questionanswering (Q/A), and opinion mining. Information extraction is an area of research that aims to automatically extract specific kinds of structured information from documents.

As a building block of information extraction, NER (named entity recognition, also known as entity extraction) is a process that identifies atomic elements in text and classifies them into predefined categories (e.g., names, places, dates). NER techniques have been successfully developed for news analysis and biomedical applications. Topic models are algorithms for discovering the main themes that pervade a large and otherwise unstructured collection of documents. New topic modeling algorithms such as LDA (latent Dirichlet allocation) and other probabilistic models have attracted recent research (Blei 2012). Question answering (Q/A) systems rely on techniques from NLP, information retrieval, and human-computer interaction. Primarily designed to answer factual questions (i.e., who, what, when, and where kinds of questions), Q/A systems involve different techniques for question analysis, source retrieval, answer extraction, and answer presentation (Maybury 2004).

The recent successes of IBM's Watson and Apple's Siri have highlighted Q/A research and commercialization opportunities. Many promising Q/A system application areas have been identified, including education, health, and defense. Opinion mining refers to the computational techniques for extracting, classifying, understanding, and assessing the opinions expressed in various online news sources, social media comments, and other user-generated contents. Sentiment analysis is often used in opinion mining to identify sentiment, affect, subjectivity, and other emotional states in online text. Web 2.0 and social media content have created abundant and exciting opportunities for understanding the opinions of the general public and consumers regarding social events, political movements, company strategies, marketing campaigns, and product preferences (Pang and Lee 2008). In addition to the above research directions, text analytics also offers significant research opportunities and challenges in several more focused areas, including web stylometric analysis for authorship attribution, multilingual analysis for web documents, and large-scale text visualization. Multimedia information retrieval and mobile information retrieval are two other related areas that require support of text analytics techniques, in addition to the core multimedia and mobile technologies.

Similar to big data analytics, text analytics using MapReduce, Hadoop, and cloud services will continue to foster active research directions in both academia and industry. Web Analytics Over the past decade, web analytics has emerged as an active field of research within BI&A. Building on the data mining and statistical analysis foundations of data analytics and on the information retrieval and NLP models in text analytics, web analytics offers unique analytical challenges and opportunities. HTTP/HTML-based hyperlinked web sites and associated web search engines and directory systems for locating web content have helped develop unique Internetbased technologies for web site crawling/spidering, web page updating, web site ranking, and search log analysis. Web log analysis based on customer transactions has subsequently turned into active research in recommender systems. However, web analytics has become even more exciting with the maturity and popularity of web services and Web 2.0 systems in the mid-2000s (O'Reilly 2005). Based on XML and Internet protocols (HTTP, SMTP), web services offer a new way of reusing and integrating third party or legacy systems. New types of web services and their associated APIs (application programming interface) allow developers to easily integrate diverse content from different web-enabled system, for example, REST (representational state transfer) for invoking remote services, RSS (really simple syndication) for news "pushing," JSON (JavaScript object notation) for lightweight data-interchange, and AJAX (asynchronous JavaScript + XML) for data interchange and dynamic display. Such lightweight programming models support data syndication and notification and "mashups" of multimedia content (e.g., Flickr, Youtube, Google Maps) from different web sourcesa process somewhat similar to ETL (extraction, transformation, and load) in BI&A 1.0. Most of the e-commerce vendors have provided mature APIs for accessing their product and customer content (Schonfeld 2005). For example, through Amazon Web Services, developers can access product catalog, customer reviews, site ranking, historical pricing, and the Amazon Elastic Compute Cloud (EC2) for computing capacity. Similarly, Google web APIs support AJAX search, Map API, GData API (for Calendar, Gmail, etc.), Google Translate, and Google App Engine for cloud computing resources.