Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time. Your staff calls people that purchase more than 3cups

You are attempting to identify the patrons that frequently purchase coffee atyour cafe chain, Coffee Time. Your staff calls people that purchase more than 3cups a week, "coffee regulars". These coffee regulars drive the profits forCoffee Time. Because of this, you are interested in finding out what characteristics separate coffee regulars from non regulars.

You have observed 540 customers from multiple Coffee Time locationsover the past 4 weeks. Your goal was to observe customers and determine whichare "coffee regulars". You then surveyed these customers to collect additional information on each individual. While not feasible for all of your customers, this has allowed you to get enough information to perform some machine learning. Your goal is to determine a system that will help you to classify coffee regulars from non regulars. In doing this, you can begin to target your advertising at non regulars in an effort to increase their coffee expenditures at Coffee Time.

The information you collected can be found inthis Excel-formatted file (.xls) fileORthis comma-seperated-values (.csv) file..Please note that you will need to save the file locally and then access it with Orange.

Below you will find descriptions for each of the columns. Use Orange project and load this data for analysis.

Please make sure that you input these variables in Orange in the correct data type. For examplethe "Location" variable should be categorical (nominal) and not continuous (numeric), which is the default option when Orange loads the data. The data types for all the variables arementioned below. Also, please make sure that Coffee Regular is defined as the targetvariable.

  • Coffee Regular : 1=yes, 0=no (Nominal, this is your target value)
  • Age : Continuous/Numeric (yrs)
  • Sex : 1=male, 0=female (Nominal)
  • Weight : Continuous/Numeric (lbs)
  • Location : 1=rural, 2=suburban, 3=urban (Nominal)
  • Occupation : Categorical/Nominal (see below)
  • Income : Continuous/Numeric ($/year)

For the Occupation column, the following job categories were pulled from the US Bureau of Labor Statistics:

  1. Management occupations
  2. Business, financial, sales, operations and administration occupations
  3. Computer and mathematical occupations
  4. Architecture and engineering occupations
  5. Life, physical, and social science occupations
  6. Community and social service occupations
  7. Legal occupations
  8. Education, training, and library occupations
  9. Arts, design, entertainment, sports, and media occupations
  10. Healthcare practitioners and technical occupations
  11. Healthcare support occupations
  12. Protective service occupations
  13. Food preparation, groundskeeping andservice related occupations
  14. Farming, fishing and forestry occupations
  15. Transportation and material moving occupations
  16. Production, installation, maintenance and construction occupations

Part 1

Using the Data Table tool, how many rows does this data table have?

Enter the number below.

unanswered

Part 2

Use the distribution visualization tool in Orange to manually identify patterns.

What do you notice about the distribution of each variable when grouping by the category "coffee regulars"?

Select all correct answers.

  1. Weight: Regular customers tend to weigh similar or slightly less than non regular customers
  2. Occupation: Many regular customers tend to be in business, financial, sales, operations and administration occupations
  3. Location: Customers in suburban areas are not likely to be regular customers
  4. Age: Most regular coffee drinkers tend to be over the age of 60

unanswered

You have collected the following information about one of your customers:

Age: 30

Location: Urban

Occupation: 13

Given what you see with the distribution tool, in which category would you place this customer?

Select the best answer.

  1. Regular Customer
  2. Non Regular Customer

unanswered

Part 4

Use the Test and Score tool in Orange to compare different classification procedures. Compare the following procedures (with the preset Orange values): Logistic Regression, K Nearest Neighbors and Random Forest Classification.

Using a ROC Analysis, which classification procedure yields the most accurate model?

Select the correct answer.

  1. Logistic Regression
  2. K Nearest Neighbors
  3. Random Forest Classification

unanswered

Part 5

0.0/1.0 point (graded)

Imagine that your IT department provides you with new, more recent data on Coffee Time customers. This data is formatted in exactly the same way as the previous data (i.e. the same attributes were measured under the same names). Will the same machine learning model applied to these two Coffee Time data sets necessarily result in the same AUC scores?

1.YES

2.NO

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Precalculus With Limits

Authors: Ron Larson

3rd Edition

1285607163, 9781285607160

More Books

Students also viewed these Mathematics questions

Question

Do not come to the conclusion too quickly

Answered: 1 week ago

Question

Engage everyone in the dialogue

Answered: 1 week ago