Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write R studio to do the following with the given dataset. Country Age salary Purchased France 44 72000 NO Spain 27 NA Yes Germany 30

Write R studio to do the following with the given dataset.

image text in transcribed

image text in transcribed

Country Age salary Purchased France 44 72000 NO Spain 27 NA Yes Germany 30 30 54000 NO Spain 38 61000 NO Germany 40 NA Yes France 35 58000 Yes Spain NA 52000 no France 48 79000 Yes Germany 50 83000 NO France NA 67000 Y Generate a summary of missing values and inconsistent values for each of the features. Your script should generate a table similar to the one shown below: Features Missing % of MV Inconsistency % of IV Values (MV) (MV) Values (IV) (IV) Country Age Salary Purchased n = number of records. An example of a missing value is NA in record 2 and the feature Salary. Another example of a missing value is for record 7 and the feature age An example of inconsistency is in record 10 and the feature purchased (Y for Yes). [20 points) Handle the missing values. Specifically, estimate missing values of age by computing the feature mean grouped by country. Similarly, estimate missing values of Salary by computing the feature mean grouped by country. You may also use the target feature for the missing value estimation. Please mention in your script what strategy do you to handle the missing values. [20 points] Correct the data inconsistency issue. The target feature is a binary class (Yes and No). However, in the correct state of the date, it has 5 class labels (Yes, Y, No, no, and NO). Correct this problem by converting the values into the appropriate class labels (Yes and No). [20 points] Store the clean data in a CSV file. [10 points) Country Age salary Purchased France 44 72000 NO Spain 27 NA Yes Germany 30 30 54000 NO Spain 38 61000 NO Germany 40 NA Yes France 35 58000 Yes Spain NA 52000 no France 48 79000 Yes Germany 50 83000 NO France NA 67000 Y Generate a summary of missing values and inconsistent values for each of the features. Your script should generate a table similar to the one shown below: Features Missing % of MV Inconsistency % of IV Values (MV) (MV) Values (IV) (IV) Country Age Salary Purchased n = number of records. An example of a missing value is NA in record 2 and the feature Salary. Another example of a missing value is for record 7 and the feature age An example of inconsistency is in record 10 and the feature purchased (Y for Yes). [20 points) Handle the missing values. Specifically, estimate missing values of age by computing the feature mean grouped by country. Similarly, estimate missing values of Salary by computing the feature mean grouped by country. You may also use the target feature for the missing value estimation. Please mention in your script what strategy do you to handle the missing values. [20 points] Correct the data inconsistency issue. The target feature is a binary class (Yes and No). However, in the correct state of the date, it has 5 class labels (Yes, Y, No, no, and NO). Correct this problem by converting the values into the appropriate class labels (Yes and No). [20 points] Store the clean data in a CSV file. [10 points)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Joe Celkos Data And Databases Concepts In Practice

Authors: Joe Celko

1st Edition

1558604324, 978-1558604322

More Books

Students also viewed these Databases questions

Question

=+1 Is the decision fair to employees?

Answered: 1 week ago