Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I tend to overthink things, but I really need help on this assignment. 3 . Data Visualization a ) Create a visualization to observe the

I tend to overthink things, but I really need help on this assignment.
3. Data Visualization
a) Create a visualization to observe the missing values and their patterns in the df_Hi and explain your observations.
b) Create a countplot of the all three categorical variables in the data and explain your observations.
c) Create a distribution plot of all numerical variables. Breifly explain what you observe from those plots.
d) Create the comparative boxplot of all the numerical variables. Explain your observation.
4. Data Exploration
a) Calculate the statistical summary of the data df_Hi. Explain your observations.
b) Observe the unique number of values, most repeated values, and least repeated values of the variables.
c) Check if there is any outliers in the dataset.
5. Imputing Missing Values
Check if there is any missing values in the Salary column of DataFrame df_Hi. What did you notice?
How do we impute the missing values for df_Hi? Why did you use that, write your reasoning.
Verify that you correctly imputed the missing values.
6. Create the Dummy Variables
a) Create the dummies of the variable NewLeague. What is the count of each category?
b) Create the dummies of the variable League. What is the count of each category?
c) Create the dummies of the variable Division. What is the count of each category?
7. Merge the Data and Perform the Correlation Analysis
a) Merge the dummies created with the DataFrame df_Hi.
b) Look at the info of the data, how many variables are there now?
c) Create the heatmap that shows the pairwise correlation between all the variables.
d) Observe the correlation coefficients and identify the pairs that has a correlation of more than 0.8 in absolute value.
e) Successively drop those variables until there is no variable with the pairwise correlation higher than 0.8.
8. Transform the data.
a) Transform the column Salary to the binary numerical variable as follows: If the Salary is above the median salary assign the value 1, otherwise assign the value 0.
b) Verify that you correctly transformed the data. Observe the count of 1 and 0 in this column after the transformation.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions