Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Question 3 PCA really shines on data where you have reason to believe that the data is relatively low in rank. In this final question

image text in transcribedimage text in transcribedimage text in transcribed
Question 3 PCA really shines on data where you have reason to believe that the data is relatively low in rank. In this final question of the homework, we'll look at how states voted in presidential elections between 1972 and 2016. Our ultimate goal in question 3 is to show how 2D PCA scatterplots can allow us to identify clusters in a high dimensional dataset. For this example, that means finding groups of states that vote similarly by plotting their 1st and 2nd principal components. df = pd. read_csv ( "presidential_elections.csv") df . head (5) State 1789 1792 1796 1800 + Unnamed: 5 1804 1808 1812 1816 ... 1992 1996 2000 * Unnamed: 60 2004 2008 2012 2016 $ 2020 State.1 0 Alabama NaN NaN NaN NaN NaN NaN NaN NaN NaN ... R R R NaN R R R R R Alabama Alaska NaN NaN NaN NaN NaN NaN NaN NaN NaN ... R R R NaN R R R R R Alaska 2 Arizona NaN NaN NaN NaN NaN NaN NaN NaN NaN ... R D R NaN R R R R D Arizona 3 Arkansas NaN NaN NaN NaN NaN NaN NaN NaN NaN ... D D R NaN R R R R R Arkansas 4 California NaN NaN NaN NaN NaN NaN NaN NaN NaN ... D D D NaN D D D D D CaliforniaThe data in this table is pretty messy, so let's create a clean version. The clean table should contain exactly 51 rows (corresponding to the 50 states plus Washington DC) and 13 columns (one for each of the election years from 1972 to 2020). The index of this dataframe should be the state name. Note: In your personal projects, it is sometimes more convenient to manually do your data cleaning using Excel or Google Sheets. The downside of doing this is that you have no record of what you did, and if you have to redownload the data, you have to redo the manual data cleaning process. df_clean = ( df . iloc [ : , -15: ] . drop ( [ ' Unnamed: 60'], axis = 1) . rename ( columns = {"2000 ": "2000", "2016 +": "2016", "State. 1": "State"} ) . drop ( [51] ) . set_index ( "State" ) df_clean . head (5) 1972 1976 1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020 State Alabama R D R R R R R R R R R R R Alaska R R R R R R R R R R R R Arizona R R R R R R D R R R R R D Arkansas R D R R R D D R R R R R R California R R R R R D D D D D D D DQuestion 3a What does each row in df_clean represent? Type your answer here, replacing this text

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

How To Prove It A Structured Approach

Authors: Daniel J Velleman

2nd Edition

0511159439, 9780511159435

More Books

Students also viewed these Mathematics questions