Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Use the adult data set from the book series web site for the following exercises. The target variable is income , and the goal is

Use the adult data set from the book series web site for the following exercises. The target

variable is income, and the goal is to classify income based on the other variables.

2. Which variables are categorical, and which are continuous?

3. Using software, construct a table of the first 10 records of the data set, in order to get a feel for the data.

4. Investigate whether we have any correlated variables.

5. For each of the categorical variables, construct a bar chart of the variable, with an overlay of the target variable. Normalize if necessary. a. Discuss the relationship, if any, each of these variables has with the target variables. b. Which variables would you expect to make a significant appearance in any data mining classification model we work with?

6. For each pair of categorical variables, construct a cross tabulation. Discuss your salient results.

7. Report on whether anomalous fields exist in this data set, based on your EDA, which fields these are, and what we should do about it.

8. Report the mean, median, minimum, maximum, and standard deviation for each of the numerical variables.

9. Construct a histogram of each numerical variables, with an overlay of the target variable income. Normalize if necessary. a. Discuss the relationship, if any, each of these variables has with the target variables. b. Which variables would you expect to make a significant appearance in any data mining classification model we work with?

10. For each pair of numerical variables, construct a scatter plot of the variables. Discuss your salient results.

11. Based on your EDA so far, identify interesting sub-groups of records within the data set that would be worth further investigation.

12. Apply binning to one of the numerical variables. Do it in such a way as to maximize the effect of the classes thus created (following the suggestions in the text). Now do it in such a way as to minimize the effect of the classes so that the difference between the classes is diminished. Comment.

13. Refer to the previous exercise. Apply the other two binning methods (equal width, and equal number of records) to this same variable. Compare the results and discuss the differences. Which method do you prefer?

14. Summarize your salient EDA findings from the above exercises, just as if you were writing a report.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Practical Neo4j

Authors: Gregory Jordan

1st Edition

1484200225, 9781484200223

More Books

Students also viewed these Databases questions

Question

5. A review of the key behaviors is included.

Answered: 1 week ago

Question

3. An overview of the key behaviors is presented.

Answered: 1 week ago