Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Consider the attached dataset on 1 3 0 2 American colleges and universities offering an undergraduate program and answer the following questions by applying the

Consider the attached dataset on 1302 American colleges and universities offering an undergraduate program and answer the following questions by applying the Python code bits used in class. Feel free (in fact you are encouraged) to consult me for any guidance.
You must demonstrate how you answered each question (in the order asked below) by submitting your Jupiter notebook file along with a Word file.
1) How many variables are there in the dataset?
2) Which variables are categorical, which are numerical?
3) Clean the dataset by removing all missing/incomplete observations. How many complete observations are left?
4) Set row names (indices) to the College Name column.
5) Clean up the variable names by
a) Replacing the following 7 characters and spaces with \"\"(for removal): (,),%, # ,\',-, $
b) Replacing the following 2 characters with \"_\": /,.
6) Compute the summary statistics of the numerical variables in the dataset.
7) Plot a histogram for each of the numerical variables by setting the axis labels in plain English (to make it easy to understand).
8) Construct a heatmap between all numerical variables and comment on the relationships among them.
9) By observing the heatmap, select three numerical variables (that you think would be interesting to include) and draw a matrix scatter diagram between the three.
10) Convert the categorical variables into integer (binary) dummy variables.
a) Explain in words, for one observation, the values in the derived binary dummies.
11) Conduct a principal components analysis (PCA) using only the original numerical variables.
a) Make sure to display the \'Standard Deviation\', \'Proportion of Variance\' and \'Cumulative Proportion\' info.
b) Comment on the results: How many principal components appear to be significant? Should the data be normalized beforehand?
12) Normalize the numerical variables using the standard scaler and redo question 11. Comment on the difference in the PCA results after normalization.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional Android 4 Application Development

Authors: Reto Meier

3rd Edition

1118223853, 9781118223857

More Books

Students also viewed these Programming questions