Question
Data Science Task using Python: Wine Quality Data Set This is one of the most popular datasets along data science beginners. It is divided into
Data Science Task using Python: Wine Quality Data Set
This is one of the most popular datasets along data science beginners. It is divided into 2 datasets.
There are 4898 rows and 12 columns in this dataset. Read the data from
https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/.
For details: https://archive.ics.uci.edu/ml/datasets/wine
Append the following five rows with your data frame. Column 1: Fixed acidity-> 7.8+.X Column 2: volatile acidity-> .88+.X Column 3: Citric acid-> 0.00+.X Column 4: Residual sugar-> 1.9 Column 5: chlorides-> 0.09+.X Column 6: Free sulfur dioxide-> 25.0+.X Column 7: Total sulfur dioxide-> 67.0+.X Column 8: density-> .991+.X Column 9: pH-> 3.22 Column 10: sulphates-> 0.68+.X Column 11: alcohol-> 9.8+.X Column 12: quality-> 5 7.8+.X .88+.X 0.00+.X 1.9 0.09+.X 25.0+.X 67.0+.X .991+.X 3.22 0.68+.X 9.8+.X 5 7.2+.X .83+.X 0.01+.X 2.2 0.19+.X 15.0+.X 60.0+.X .996+.X 3.52 0.55+.X 9.6+.X 6 7.9+.X .89+.X 0.01+.X 1.7 0.08+.X 22.0+.X 57.0+.X .997+.X 3.26 0.64+.X 9.8+.X 2 7.7+.X .86+.X 0.02+.X 2.3 0.07+.X 11.0+.X 38.0+.X .994+.X 3.12 0.08+.X 9.4+.X 3 .X is the two last digits of ID(#98) with a decimal point
Hints: If your DataFrame is df, then use the following codes to append the first two rows with your
dataframe.
# List of data series
datarowsSeries = [pd.Series([0.069,10,2.3,0,0.53,6.5,65.2,4.01,1,290,15,395,4.9,24],
index=df.columns ), pd.Series([0.069,11,2.3,0,0.6,6.6,65.3,4.2,1,290,15,395,4.9,24],
index=df.columns ) ]
# Pass the list of data series to the append() to add multiple rows
new_data = df.append(datarowsSeries , ignore_index=True)
The wine in a scale that ranges from 0 (very bad) to 10 (excellent). Now, you need to reallocate the
quality of wine as 0: 0 to 5 (Average quality) and 1: 6 to 10 (Good quality). Use the following codes
to do so
Hints:
df['quality'] = df['quality'].where(df['quality']<= 5, 0)
df['quality'] = df['quality'].where(df['quality']>5, 1)
df['quality'] = df['quality'].map({0:'Average', 1:'Good'})
Pleas answer the following questions in details (step by step):
1. A description of the data: what it is and where it came from.
2. What questions /objectives you are addressing.
3. Data Cleaning, if required.
4. Construct suitable plots of the data.
5. Find the correlation between the quality of wine with other variables.
6. Conduct appropriate mean and proportion tests.
7. Fit a suitable model to predict the quality of wine.
8. A brief discussion of the results
Please find the details question in original format -> https://drive.google.com/file/d/1pHQXl0_4QU_2FTUkGAE7B5PsKeUYdm89/view?usp=sharing
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started