What is the difference between a supervised and an unsupervised approach?
What is the difference between training data sets and test (or testing) data sets?
Using Figure 3-5 as a guide, what are three data approaches associated with the supervised approach?
Using Figure 3-5 as a guide, what are three data approaches associated with the unsupervised approach?
Problem 2
An auditor is trying to figure out if the inventory at an electronics store chain is obsolete. What characteristics might be used to help establish a model predicting inventory obsolescence?
36 Chapter 3 Modeling and Evaluation Identify your Classification Casual modeling question (Whether or not) (Event Influences other] Yes Yes Do you have too much Yes Data reduction Supervised Are you data? Aggregate data) Method predicting an (Training data Do you have Regression event No lots of money? (How much?) exist) or class? No No Can you Identify obvious Yes Similarity matching groups? (Natural grouping) Unsupervised Are you Are you Clustering Method trying to rank No looking for No (Undiscovered No Exploratory) your data? common events? groups Are you Yes looking for Yes hidden Link prediction links? (Social networks) Profiling Co-occurrence grouping No (Typical (Events that behavior) happen together Do you have a specific Yes target in mind? No EXHIBIT 3-5 Flowchart to Help Choose an Appropriate Data Model where you attempt to identify causation (which can be expensive), identify a series of char- acteristics that predict a model, or attempt to identify other relationships, respectively. Ultimately, the model you use comes down to the questions you are trying to answer. The flowchart in Exhibit 3-5 shows several decisions that will help you select an appropriate model, or data approach. By evaluating your data, the question that needs to be addressed as well as the desired outcomes, an appropriate data approach can be determined. Once you've selected an approach, then your analysis can begin. We highlighted the data analytics approaches in chapter 1 and provide them again here for reference: . Classification: A data approach used to assign each unit (or individual) in a population into a few categories. An example classification might be, of all of the loans this bank has offered, which are most likely to default? Or which loan applications are expected to be approved? In this case. classification would classify loan requests as either approved or denied. A second example would be a credit card company flagging a trans- action as being approved or potentially being fraudulent and denying payment. Regression: A data approach used to estimate or predict, for each unit, the numerical value of some variable using some type of statistical model. An example of regression analysis might be, given a balance of total accounts receivable held by a firm, what is the appropriate level of allowance for doubtful accounts for bad debts