Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Dimension Reduction Assignment Your goal is to conduct PCA on a dataframe named numeirc_df_without_shelf as you see below. Please make sure that you include only

Dimension Reduction Assignment

Your goal is to conduct PCA on a dataframe named numeirc_df_without_shelf as you see below. Please make sure that you include only the code that is necessary to complete the given task. Avoid copy-pasting everything from the class notebook. Submissions containing unrelated code will be penalized by 20%.

In [2]:

 
import pandas as pd
df = pd.read_excel('Cereals.xlsx', sheet_name='Data from DASL')
numeirc_df_without_shelf_rating = df.iloc[:, 3:].drop(columns=['shelf', 'rating'])
numeirc_df_without_shelf_rating.head()

Out[2]:

calories protein fat sodium fiber carbo sugars potass vitamins weight cups
0 70 4 1 130 10.0 5.0 6.0 280.0 25 1.0 0.33
1 120 3 5 15 2.0 8.0 8.0 135.0 0 1.0 1.00
2 70 4 1 260 9.0 7.0 5.0 320.0 25 1.0 0.33
3 50 4 0 140 14.0 8.0 0.0 330.0 25 1.0 0.50
4 110 2 2 200 1.0 14.0 8.0 NaN 25 1.0 0.75

Rating is excluded because it is our dependent variable and we don't want to mix it with independent variables.

why do you think we excluded shelf? Answer below.

Type Markdown and LaTeX: 22

Normalize the dataframe.

In [ ]:

 

Run PCA.

In [ ]:

 
 

Calculate cumulative explained variance ratio and determine how many of the new dimensions you need to explain 90% of the variance.

In [ ]:

 
 

Save the PCA components to a CSV file.

In [ ]:

 
 

The rest of the assignment happens in Excel:

Use Excel to open the CSV file

Save the CSV file as Excel.

Add the correct row and column headers. Remember that in the file that is generated, new dimensions are the rows and old dimensions are the columns.

In the CSV file, keep as many of the new dimensions that you need to explain 90% of the variance and delete the rest.

Using conditional formatting, run a heatmap on your data.

Check out the coefficients of the new dimensions and label the new dimensions beased on their relationship with the old dimensions.

What to submit: submit this notebook and your Excel file.

In [ ]:

 
                        

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Systems Design Implementation And Management

Authors: Carlos Coronel, Steven Morris

14th Edition

978-0357673034

More Books

Students also viewed these Databases questions

Question

What is the environment we are trying to create?

Answered: 1 week ago