Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 24, 2024

The dataset you will analyze in this HW is the Wine dataset. The dataset consists of the following variables. You will cluster the wines into

The dataset you will analyze in this HW is the Wine dataset. The dataset consists of the following variables. You will cluster the wines into several clusters based on the following attributes. Description of attributes: 1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily) 2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste 3 - citric acid: found in small quantities, citric acid can add 'freshness' and flavor to wines 4 - residual sugar: the amount of sugar remaining after fermentation stops, it's rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet 5 - chlorides: the amount of salt in the wine 6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine 7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine 8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content 9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale 10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant 11 - alcohol: the percent alcohol content of the wine 12 - quality (score between 0 and 10) Dataset is posted on Blackboard. Problem 1. Read the dataset into a dataframe. Be sure to import the header. (2) 2. Drop Wine from the dataframe. (1) 3. Extract Quality and store it in a separate variable. (1) 4. Drop Quality from dataframe. (1) 5. Print the dataframe and Quality. (1) 6. Normalize all columns of the dataframe. Use the Normalizer class from sklearn.preprocessing. (2) 7. Print the normalized dataframe. (1) 8. Create a range of k values from 1:11 for KMeans clustering. Iterate on the k values and store the inertia for each clustering in a list. (2) 9. Plot the chart of inertia vs number of clusters. (2) 10. What K (number of clusters) would you pick for KMeans? (1) 11. Now cluster the wines into K clusters. Use random_state = 2023 when you instantiate the KMeans model. Assign the respective cluster number to each wine. Print the dataframe showing the cluster number for each wine. (2) 12. Add the quality back to the dataframe. (1) 13. Now print a crosstab (from Pandas) of cluster number vs quality. Comment if the clusters represent the quality of wine. (3)

Figure 1 Number of Clusters, k [1599Clusterquality345678rowsx021032473721310925596122columns]22811916438331711912424244988106437511068101452

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

SQL Server T-SQL Recipes

Authors: David Dye, Jason Brimhall

4th Edition

1484200616, 9781484200612

More Books

Students also viewed these Databases questions

Question

What could be some potential problems with the peer recognition program at APTN?

Answered: 1 week ago

Question

★★★★★

Audits of financial statements are designed determine whether account balances are materially correct. Assume that your client is a construction company that has the following assets on its balance...

Answered: 1 week ago

Question

★★★★★

The dataset you will analyze in this HW is the Wine dataset. The dataset consists of the following variables. You will cluster the wines into several clusters based on the following attributes....

Answered: 1 week ago

Question

★★★★★

what are the viscosities of sorbitol at 296 degrees, manntiol at 290 degrees, water at 100 degrees

Answered: 1 week ago

Question

★★★★★

You built a dashboard with several views exploring your product inventory data set. One of the views should display only products that have a quantity below 50. Which type of filter should you use in...

Answered: 1 week ago

Question

★★★★★

Suppose the accompanying data are from a paper on a community of Antarctic penguins. They represent the body mass (in grams) for a sample of 10 female chinstrap penguins. 3,350 3,250 3,800 3,550...

Answered: 1 week ago

Question

★★★★★

In the mass production of engine blocks for a single cylinder, the tolerance in the diameter (D) of the cylinders is normally distributed. The average diameter of the cylinders has been measured to...

Answered: 1 week ago

Question

★★★★★

The average weight of a particular box of crackers is 24.5 ounces with a standard deviation of 0.8 ounce. The weights of the boxes are normally distributed. a. What percent of the boxes weigh more...

Answered: 1 week ago

Question

★★★★★

Cain Company reports net cash provided by operating activities of $33,000. It also reports the following information under "Adjustments to reconcile net income to net cash provided by operating...

Answered: 1 week ago

Question

★★★★★

Recognize the important role customer loyalty plays in driving a service firms profitability.

Answered: 1 week ago

Question

★★★★★

Explain the qualities of effective leaders in service organizations.

Answered: 1 week ago

Question

★★★★★

Understand why customers are loyal to a particular service firm.

Answered: 1 week ago

Previous Question Next Question