Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Study Guide Midterm ExamDATA 3 3 0 0 The following is a list of topics that will be covered on the midterm exam. Please see

Study Guide Midterm ExamDATA 3300 The following is a list of topics that will be covered on the midterm exam. Please see the Canvas page Module 7: Midterm Exam for details about the exam and study materials. Data Analytics & Data Science What data science is and how its used Common sources of data:a. Social networksb. Traditional business systemsc. Internet of Things Different types of data analytics and their application(s):a. Diagnosticb. Descriptivec. Predictived. Prescriptive The sequence of steps in the CRISP-DM process (and the importance of each):a. Business Understandingb. Data Understandingc. Data Preparationd. Modelinge. Evaluationf. Deployment Data Quality & Preparation What ETL is and why it is important in data analytics Five data quality characteristics:a. Accuracyb. Uniquenessc. Completenessd. Consistencye. Time-Appropriateness Common forms of dirty data (and the threats they pose to data analysis):a. Errors (typos, misspellings)b. Inconsistent Datac. Absence of Datad. Contradicting Datae. Reused Primary Keys Common steps in data cleansing (and how long data cleansing takes as part of the overall data mining process):a. Parsingb. Correctingc. Standardizingd. Matchinge. Consolidating Data Understanding Familiarize yourself with the differences between qualitative and quantitative variable types:a. Qualitative:i. Nominalii. Ordinalb. Qualitativei. Ratioii. Interval Understand how each of the metrics below relate to data exploration (data distribution, central tendency, and data dispersion)a. Understand each of the following descriptive statistics and their purpose:i. Meanii. Medianiii. Modeiv. Variancev. Standard deviationvi. Interquartile rangevii. Outliersb. Understand and identify:i. Skewnessii. Kurtosis Basic principles of visualization (and how to interpret visualizations), including most common chart types and their purpose (and how to create visualizations that clearly communicate the data, not just reinforce prior beliefs):a. Histogramsb. Line chartsc. Box plotsd. Pie Chartse. Stacked column chart Modeling Foundations What data mining is, its appropriate applications, and common data mining tasks What is meant by the terms:a. Data instance/Record/Case/Observationb. Attributes/Variablesi. Target attribute/Dependent variable The difference between supervised and unsupervised data mining The difference between classification and regression types of supervised data mininga. Classification When the DV is categoricalb. Regression When the DV is numerical Association Rules Analysis What association analysis is, the type of data it requires, and the types of business questions it can answer What an association rule looks like:a. Itemsets and their role in association rulesb. Antecedents and consequents Understand how to calculate and interpret:a. Supportb. Confidencec. Lift Know the tradeoffs between adjusting the minimum support and confidence thresholds and the resulting association rules generated Clustering Analysis What clustering analysis is, the type of data it requires, and the types of business questions it can answer What k-means clustering is:a. Understand the steps involvedi. Why we sometimes normalize data in cluster analysis1. z-score normalizationii. Impact of potentially over-weighting variables which are measuring the same thing in different waysb. Know what k stands for and how it is determinedi. Post-hoc evaluationii. Elbow ruleiii. Tractabilityc. Understand the basics of how the algorithm worksi. How are clusters determined?ii. How is a centroid value determined?iii. What is the relationship between a cluster and a centroid?1. How do you interpret results with centroid values? Ways to calculate similarity/dissimilarity between cases and how to interpret the distancea. Euclidian Distance (know and interpret formula)b. Intra-class similarityc. Inter-class similarity Understand how to interpret a cluster analysis through a centroid table and plot. Statistical Correlation What correlation analysis is, the type of data it requires, and the types of business questions it can answer Know how to identify and interpret:a. Correlation coefficientb. Correlation analysis resultsc. Convergent validity (and when to use it)d. Coefficient of determination (know how to calculate it) What scatter plots look like for strong versus no relationship between two variables Assumptions and limitations of correlation analysis:a. Homoscedasticityb. Normal distribution of datac. Impact of outliers Good Luck!

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Microsoft SQL Server 2014 Administration

Authors: Adam Jorgensen, Bradley Ball

1st Edition

111885926X, 9781118859261

More Books

Students also viewed these Databases questions