Question
Dimension Reduction Assignment Your goal is to conduct PCA on a dataframe named numeirc_df_without_shelf as you see below. Please make sure that you include only
Dimension Reduction Assignment
Your goal is to conduct PCA on a dataframe named numeirc_df_without_shelf as you see below. Please make sure that you include only the code that is necessary to complete the given task. Avoid copy-pasting everything from the class notebook. Submissions containing unrelated code will be penalized by 20%.
In [2]:
import pandas as pd
df = pd.read_excel('Cereals.xlsx', sheet_name='Data from DASL')
numeirc_df_without_shelf_rating = df.iloc[:, 3:].drop(columns=['shelf', 'rating'])
numeirc_df_without_shelf_rating.head()
Out[2]:
calories | protein | fat | sodium | fiber | carbo | sugars | potass | vitamins | weight | cups | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 70 | 4 | 1 | 130 | 10.0 | 5.0 | 6.0 | 280.0 | 25 | 1.0 | 0.33 |
1 | 120 | 3 | 5 | 15 | 2.0 | 8.0 | 8.0 | 135.0 | 0 | 1.0 | 1.00 |
2 | 70 | 4 | 1 | 260 | 9.0 | 7.0 | 5.0 | 320.0 | 25 | 1.0 | 0.33 |
3 | 50 | 4 | 0 | 140 | 14.0 | 8.0 | 0.0 | 330.0 | 25 | 1.0 | 0.50 |
4 | 110 | 2 | 2 | 200 | 1.0 | 14.0 | 8.0 | NaN | 25 | 1.0 | 0.75 |
Rating is excluded because it is our dependent variable and we don't want to mix it with independent variables.
why do you think we excluded shelf? Answer below.
Type Markdown and LaTeX: 22
Normalize the dataframe.
In [ ]:
Run PCA.
In [ ]:
Calculate cumulative explained variance ratio and determine how many of the new dimensions you need to explain 90% of the variance.
In [ ]:
Save the PCA components to a CSV file.
In [ ]:
The rest of the assignment happens in Excel:
Use Excel to open the CSV file
Save the CSV file as Excel.
Add the correct row and column headers. Remember that in the file that is generated, new dimensions are the rows and old dimensions are the columns.
In the CSV file, keep as many of the new dimensions that you need to explain 90% of the variance and delete the rest.
Using conditional formatting, run a heatmap on your data.
Check out the coefficients of the new dimensions and label the new dimensions beased on their relationship with the old dimensions.
What to submit: submit this notebook and your Excel file.
In [ ]:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started