You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time Your staff calls people that purchase more than 3cups a week, coffee regulars These coffee regulars drive the profits forCoffee Time Because of this, you are interested in finding out what characteristics separate coffee regulars from non regulars You have observed 540 customers from multiple Coffee Time locationsover the past 4 weeks Your goal was to observe customers and determine whichare coffee regulars You then surveyed these customers to collect additional information on each individual While not feasible for all of your customers, this has allowed you to get enough information to perform some machine learning Your goal is to determine a system that will help you to classify coffee regulars from non regulars In doing this, you can begin to target your advertising at non regulars in an effort to increase their coffee expenditures at Coffee Time The information you collected can be found inthis Excel formatted file ( xls) fileORthis comma seperated values ( csv) file Please note that you will need to save the file locally and then access it with Orange Below you will find descriptions for each of the columns Use Orange project and load this data for analysis Please make sure that you input these variables in Orange in the correct data type For examplethe Location variable should be categorical (nominal) and not continuous (numeric), which is the default option when Orange loads the data The data types for all the variables arementioned below Also, please make sure that Coffee Regular is defined as the targetvariable Coffee Regular 1 yes, 0 no (Nominal, this is your target value) Age Continuous Numeric (yrs) Sex 1 male, 0 female (Nominal) Weight Continuous Numeric (lbs) Location 1 rural, 2 suburban, 3 urban (Nominal) Occupation Categorical Nominal (see below) Income Continuous Numeric ($ year) For the Occupation column, the following job categories were pulled from the US Bureau of Labor Statistics Management occupations Business, financial, sales, operations and administration occupations Computer and mathematical occupations Architecture and engineering occupations Life, physical, and social science occupations Community and social service occupations Legal occupations Education, training, and library occupations Arts, design, entertainment, sports, and media occupations Healthcare practitioners and technical occupations Healthcare support occupations Protective service occupations Food preparation, groundskeeping andservice related occupations Farming, fishing and forestry occupations Transportation and material moving occupations Production, installation, maintenance and construction occupations Part 1 Using the Data Table tool, how many rows does this data table have Enter the number below unanswered Part 2 Use the distribution visualization tool in Orange to manually identify patterns What do you notice about the distribution of each variable when grouping by the category coffee regulars Select all correct answers Weight Regular customers tend to weigh similar or slightly less than non regular customers Occupation Many regular customers tend to be in business, financial, sales, operations and administration occupations Location Customers in suburban areas are not likely to be regular customers Age Most regular coffee drinkers tend to be over the age of 60 unanswered You have collected the following information about one of your customers Age 30 Location Urban Occupation 13 Given what you see with the distribution tool, in which category would you place this customer Select the best answer Regular Customer Non Regular Customer unanswered Part 4 Use the Test and Score tool in Orange to compare different classification procedures Compare the following procedures (with the preset Orange values) Logistic Regression, K Nearest Neighbors and Random Forest Classification Using a ROC Analysis, which classification procedure yields the most accurate model Select the correct answer Logistic Regression K Nearest Neighbors Random Forest Classification unanswered Part 5 0 0 1 0 point (graded) Imagine that your IT department provides you with new, more recent data on Coffee Time customers This data is formatted in exactly the same way as the previous data (i e the same attributes were measured under the same names) Will the same machine learning model applied to these two Coffee Time data sets necessarily result in the same AUC scores 1 YES 2 NO

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jun 25, 2024

You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time. Your staff calls people that purchase more than 3cups

You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time. Your staff calls people that purchase more than 3cups a week, "coffee regulars". These coffee regulars drive the profits forCoffee Time. Because of this, you are interested in finding out what characteristics separate coffee regulars from non regulars.

You have observed 540 customers from multiple Coffee Time locationsover the past 4 weeks. Your goal was to observe customers and determine whichare "coffee regulars". You then surveyed these customers to collect additional information on each individual. While not feasible for all of your customers, this has allowed you to get enough information to perform some machine learning. Your goal is to determine a system that will help you to classify coffee regulars from non regulars. In doing this, you can begin to target your advertising at non regulars in an effort to increase their coffee expenditures at Coffee Time.

The information you collected can be found inthis Excel-formatted file (.xls) fileORthis comma-seperated-values (.csv) file..Please note that you will need to save the file locally and then access it with Orange.

Below you will find descriptions for each of the columns. Use Orange project and load this data for analysis.

Please make sure that you input these variables in Orange in the correct data type. For examplethe "Location" variable should be categorical (nominal) and not continuous (numeric), which is the default option when Orange loads the data. The data types for all the variables arementioned below. Also, please make sure that Coffee Regular is defined as the targetvariable.

Coffee Regular : 1=yes, 0=no (Nominal, this is your target value)
Age : Continuous/Numeric (yrs)
Sex : 1=male, 0=female (Nominal)
Weight : Continuous/Numeric (lbs)
Location : 1=rural, 2=suburban, 3=urban (Nominal)
Occupation : Categorical/Nominal (see below)
Income : Continuous/Numeric ($/year)

For the Occupation column, the following job categories were pulled from the US Bureau of Labor Statistics:

Management occupations
Business, financial, sales, operations and administration occupations
Computer and mathematical occupations
Architecture and engineering occupations
Life, physical, and social science occupations
Community and social service occupations
Legal occupations
Education, training, and library occupations
Arts, design, entertainment, sports, and media occupations
Healthcare practitioners and technical occupations
Healthcare support occupations
Protective service occupations
Food preparation, groundskeeping andservice related occupations
Farming, fishing and forestry occupations
Transportation and material moving occupations
Production, installation, maintenance and construction occupations

Part 1

Using the Data Table tool, how many rows does this data table have?

Enter the number below.

unanswered

Part 2

Use the distribution visualization tool in Orange to manually identify patterns.

What do you notice about the distribution of each variable when grouping by the category "coffee regulars"?

Select all correct answers.

Weight: Regular customers tend to weigh similar or slightly less than non regular customers
Occupation: Many regular customers tend to be in business, financial, sales, operations and administration occupations
Location: Customers in suburban areas are not likely to be regular customers
Age: Most regular coffee drinkers tend to be over the age of 60

unanswered

You have collected the following information about one of your customers:

Age: 30

Location: Urban

Occupation: 13

Given what you see with the distribution tool, in which category would you place this customer?

Select the best answer.

Regular Customer
Non Regular Customer

unanswered

Part 4

Use the Test and Score tool in Orange to compare different classification procedures. Compare the following procedures (with the preset Orange values): Logistic Regression, K Nearest Neighbors and Random Forest Classification.

Using a ROC Analysis, which classification procedure yields the most accurate model?

Select the correct answer.

Logistic Regression
K Nearest Neighbors
Random Forest Classification

unanswered

Part 5

0.0/1.0 point (graded)

Imagine that your IT department provides you with new, more recent data on Coffee Time customers. This data is formatted in exactly the same way as the previous data (i.e. the same attributes were measured under the same names). Will the same machine learning model applied to these two Coffee Time data sets necessarily result in the same AUC scores?

1.YES

2.NO