Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This problem is an example of data preprocessing needed in a data mining process. Suppose that a hospital tested the age and body fat data

This problem is an example of data preprocessing needed in a data mining process.

Suppose that a hospital tested the age and body fat data for 18 randomly selected adults with the following results: Age 26 26 29 29 40 45 50 55 60 %fat 10.5 30.5 8.8 20.8 32.4 26.9 30.4 30.2 33.2 Age 55 45 60 55 61 62 63 75 66 %fat 36.6 44.5 30.8 35.4 33.2 36.1 37.9 43.2 37.7

a. Draw the box-plots for age and %fat. Interpret the distribution of the data.

b. Normalize the two attributes based on z-score normalization.

c. Regardless of the original ranges of the variables, normalization techniques transform the data into new ranges that allow to compare and use variables on the same scales. What are the values ranges of the following normalization methods (for this data set and in general)? Explain and backup your answer.

i. Min-max normalization ii. Z-score normalization iii. Normalization by decimal scaling.

d. Draw a scatterplot based on the two variables and interpret the relationship between the two variables.

e. Calculate the correlation matrix. Are these two attributes positively or negatively correlated? Calculate the covariance matrix. How is the correlation matrix different from the covariance matrix?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started