Question
This problem is an example of data preprocessing needed in a data mining process. Suppose that a hospital tested the age and body fat data
This problem is an example of data preprocessing needed in a data mining process.
Suppose that a hospital tested the age and body fat data for 18 randomly selected adults with the following results: Age 26 26 29 29 40 45 50 55 60 %fat 10.5 30.5 8.8 20.8 32.4 26.9 30.4 30.2 33.2 Age 55 45 60 55 61 62 63 75 66 %fat 36.6 44.5 30.8 35.4 33.2 36.1 37.9 43.2 37.7
a. Draw the box-plots for age and %fat. Interpret the distribution of the data.
b. Normalize the two attributes based on z-score normalization.
c. Regardless of the original ranges of the variables, normalization techniques transform the data into new ranges that allow to compare and use variables on the same scales. What are the values ranges of the following normalization methods (for this data set and in general)? Explain and backup your answer.
i. Min-max normalization ii. Z-score normalization iii. Normalization by decimal scaling.
d. Draw a scatterplot based on the two variables and interpret the relationship between the two variables.
e. Calculate the correlation matrix. Are these two attributes positively or negatively correlated? Calculate the covariance matrix. How is the correlation matrix different from the covariance matrix?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started