Question
You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time. Your staff calls people that purchase more than 3cups
You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time. Your staff calls people that purchase more than 3cups a week, "coffee regulars". These coffee regulars drive the profits forCoffee Time. Because of this, you are interested in finding out what characteristics separate coffee regulars from non regulars.
You have observed 540 customers from multiple Coffee Time locationsover the past 4 weeks. Your goal was to observe customers and determine whichare "coffee regulars". You then surveyed these customers to collect additional information on each individual. While not feasible for all of your customers, this has allowed you to get enough information to perform some machine learning. Your goal is to determine a system that will help you to classify coffee regulars from non regulars. In doing this, you can begin to target your advertising at non regulars in an effort to increase their coffee expenditures at Coffee Time.
The information you collected can be found inthis Excel-formatted file (.xls) fileORthis comma-seperated-values (.csv) file..Please note that you will need to save the file locally and then access it with Orange.
Below you will find descriptions for each of the columns. Use Orange project and load this data for analysis.
Please make sure that you input these variables in Orange in the correct data type. For examplethe "Location" variable should be categorical (nominal) and not continuous (numeric), which is the default option when Orange loads the data. The data types for all the variables arementioned below. Also, please make sure that Coffee Regular is defined as the targetvariable.
- Coffee Regular : 1=yes, 0=no (Nominal, this is your target value)
- Age : Continuous/Numeric (yrs)
- Sex : 1=male, 0=female (Nominal)
- Weight : Continuous/Numeric (lbs)
- Location : 1=rural, 2=suburban, 3=urban (Nominal)
- Occupation : Categorical/Nominal (see below)
- Income : Continuous/Numeric ($/year)
For the Occupation column, the following job categories were pulled from the US Bureau of Labor Statistics:
- Management occupations
- Business, financial, sales, operations and administration occupations
- Computer and mathematical occupations
- Architecture and engineering occupations
- Life, physical, and social science occupations
- Community and social service occupations
- Legal occupations
- Education, training, and library occupations
- Arts, design, entertainment, sports, and media occupations
- Healthcare practitioners and technical occupations
- Healthcare support occupations
- Protective service occupations
- Food preparation, groundskeeping andservice related occupations
- Farming, fishing and forestry occupations
- Transportation and material moving occupations
- Production, installation, maintenance and construction occupations
Part 1
Using the Data Table tool, how many rows does this data table have?
Enter the number below.
unanswered
Part 2
Use the distribution visualization tool in Orange to manually identify patterns.
What do you notice about the distribution of each variable when grouping by the category "coffee regulars"?
Select all correct answers.
- Weight: Regular customers tend to weigh similar or slightly less than non regular customers
- Occupation: Many regular customers tend to be in business, financial, sales, operations and administration occupations
- Location: Customers in suburban areas are not likely to be regular customers
- Age: Most regular coffee drinkers tend to be over the age of 60
unanswered
You have collected the following information about one of your customers:
Age: 30
Location: Urban
Occupation: 13
Given what you see with the distribution tool, in which category would you place this customer?
Select the best answer.
- Regular Customer
- Non Regular Customer
unanswered
Part 4
Use the Test and Score tool in Orange to compare different classification procedures. Compare the following procedures (with the preset Orange values): Logistic Regression, K Nearest Neighbors and Random Forest Classification.
Using a ROC Analysis, which classification procedure yields the most accurate model?
Select the correct answer.
- Logistic Regression
- K Nearest Neighbors
- Random Forest Classification
unanswered
Part 5
0.0/1.0 point (graded)
Imagine that your IT department provides you with new, more recent data on Coffee Time customers. This data is formatted in exactly the same way as the previous data (i.e. the same attributes were measured under the same names). Will the same machine learning model applied to these two Coffee Time data sets necessarily result in the same AUC scores?
1.YES
2.NO
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started