Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Science Task using Python: Wine Quality Data Set This is one of the most popular datasets along data science beginners. It is divided into

Data Science Task using Python: Wine Quality Data Set

This is one of the most popular datasets along data science beginners. It is divided into 2 datasets.

There are 4898 rows and 12 columns in this dataset. Read the data from

https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/.

For details: https://archive.ics.uci.edu/ml/datasets/wine

Append the following five rows with your data frame. Column 1: Fixed acidity-> 7.8+.X Column 2: volatile acidity-> .88+.X Column 3: Citric acid-> 0.00+.X Column 4: Residual sugar-> 1.9 Column 5: chlorides-> 0.09+.X Column 6: Free sulfur dioxide-> 25.0+.X Column 7: Total sulfur dioxide-> 67.0+.X Column 8: density-> .991+.X Column 9: pH-> 3.22 Column 10: sulphates-> 0.68+.X Column 11: alcohol-> 9.8+.X Column 12: quality-> 5 7.8+.X .88+.X 0.00+.X 1.9 0.09+.X 25.0+.X 67.0+.X .991+.X 3.22 0.68+.X 9.8+.X 5 7.2+.X .83+.X 0.01+.X 2.2 0.19+.X 15.0+.X 60.0+.X .996+.X 3.52 0.55+.X 9.6+.X 6 7.9+.X .89+.X 0.01+.X 1.7 0.08+.X 22.0+.X 57.0+.X .997+.X 3.26 0.64+.X 9.8+.X 2 7.7+.X .86+.X 0.02+.X 2.3 0.07+.X 11.0+.X 38.0+.X .994+.X 3.12 0.08+.X 9.4+.X 3 .X is the two last digits of ID(#98) with a decimal point 

Hints: If your DataFrame is df, then use the following codes to append the first two rows with your

dataframe.

# List of data series

datarowsSeries = [pd.Series([0.069,10,2.3,0,0.53,6.5,65.2,4.01,1,290,15,395,4.9,24],

index=df.columns ), pd.Series([0.069,11,2.3,0,0.6,6.6,65.3,4.2,1,290,15,395,4.9,24],

index=df.columns ) ]

# Pass the list of data series to the append() to add multiple rows

new_data = df.append(datarowsSeries , ignore_index=True)

The wine in a scale that ranges from 0 (very bad) to 10 (excellent). Now, you need to reallocate the

quality of wine as 0: 0 to 5 (Average quality) and 1: 6 to 10 (Good quality). Use the following codes

to do so

Hints:

df['quality'] = df['quality'].where(df['quality']<= 5, 0)

df['quality'] = df['quality'].where(df['quality']>5, 1)

df['quality'] = df['quality'].map({0:'Average', 1:'Good'})

Pleas answer the following questions in details (step by step):

1. A description of the data: what it is and where it came from.

2. What questions /objectives you are addressing.

3. Data Cleaning, if required.

4. Construct suitable plots of the data.

5. Find the correlation between the quality of wine with other variables.

6. Conduct appropriate mean and proportion tests.

7. Fit a suitable model to predict the quality of wine.

8. A brief discussion of the results

Please find the details question in original format -> https://drive.google.com/file/d/1pHQXl0_4QU_2FTUkGAE7B5PsKeUYdm89/view?usp=sharing

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Mathematical Economics Volume 19

Authors: Shigeo Kusuoka, Toru Maruyama

1st Edition

4431554890, 9784431554899

More Books

Students also viewed these Mathematics questions

Question

Behaviour: What am I doing?

Answered: 1 week ago