Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Need help with understanding and solving below question, check attached image for the complete question, basically asking for a R or a Python program using

Need help with understanding and solving below question, check attached image for the complete question, basically asking for a R or a Python program using a data set mentioned in the attached image from kaggle. I do no intend to submit the tutor's work as my own but since I am completely new to this I would like to understand and then again attempt on my own referring to the solutions. Please help.

image text in transcribed
1. R/Python Project: The imdb-5000 data is a collection of information about 5000 movies made in the US and around the world. Information about this data set can be obtained from Kaggle.com site. Go to this site, read the background and characteristics of this data set. la) Download the zip file that contains the data. Using R data frames, or Python pandas, transform this datset into a data frame. In doing so you need to perform the following filtering and cleaning of the data: . Unzip the data, and read into a data frame called movieDat . For this homework we need only those features which are numerical. Create a new data frame called nmovieDat and copy only columns of movieDat which are numerical in it. Once you have prepared the nmovieDat data frame, print the head and tail to make sure you have read everything correctly. 1b) Find the mean, variance and standard deviation of each numerical feature and print your results. Comment on whether it is wise to do a scaling of the data before running any type of clustering or not. 1c) Now apply the principal component analysis on the data using prcomp in R (or the equivalent in Python.) Based on your response on part 1b) above decide whether the option scale should be true or false. Save the output of prcomp in an object called usarrests. pca To see the components of this object, use the names function and see what information is in it. Print the result. Extract the center and scale. Explain why these values are different (or the same as) the mean and variances above. 1d) Print the rotation matrix. le) Extract the standard deviation of the principal components, these are the singular values of the data matrix. Verify this by using matrix operations (svd, matrix multiplication, etc.) If) Print and graph the contribution of each principal component by graphing their variance (from the highest to lowest). You may use the screeplot function in R, see its documentation. Do you think it would be justified to keep all four features, or would it be wise to drop one or more? Justify your answer. 1g) Use the biplot function of R (see its documentation) to plot the "factor loadings" of each data point on the first two principal components. Observe the four vectors corresponding to the four features. Each one has a PC1 and a PC2 components. Observe the graph. Which features have roughly similar factor loading (coordinates in each PCA direction), and which one(s) are visibly different from the others? Observe the UrbanPop vector. Look at the projection of each state on this vector. What type of states project to the end of the vector, and what types project to the opposite direction

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Cambridge International AS & A Level Further Mathematics Coursebook

Authors: Lee Mckelvey, Martin Crozier

1st Edition

1108403379, 978-1108403375

More Books

Students also viewed these Mathematics questions

Question

Rolling friction explain?

Answered: 1 week ago

Question

Sliding friction explain?

Answered: 1 week ago

Question

Define ISI.

Answered: 1 week ago

Question

Describe the Indian public distribution system.

Answered: 1 week ago