Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Write R studio to do the following with the given dataset. Country Age salary Purchased France 44 72000 NO Spain 27 NA Yes Germany 30
Write R studio to do the following with the given dataset.
Country Age salary Purchased France 44 72000 NO Spain 27 NA Yes Germany 30 30 54000 NO Spain 38 61000 NO Germany 40 NA Yes France 35 58000 Yes Spain NA 52000 no France 48 79000 Yes Germany 50 83000 NO France NA 67000 Y Generate a summary of missing values and inconsistent values for each of the features. Your script should generate a table similar to the one shown below: Features Missing % of MV Inconsistency % of IV Values (MV) (MV) Values (IV) (IV) Country Age Salary Purchased n = number of records. An example of a missing value is NA in record 2 and the feature Salary. Another example of a missing value is for record 7 and the feature age An example of inconsistency is in record 10 and the feature purchased (Y for Yes). [20 points) Handle the missing values. Specifically, estimate missing values of age by computing the feature mean grouped by country. Similarly, estimate missing values of Salary by computing the feature mean grouped by country. You may also use the target feature for the missing value estimation. Please mention in your script what strategy do you to handle the missing values. [20 points] Correct the data inconsistency issue. The target feature is a binary class (Yes and No). However, in the correct state of the date, it has 5 class labels (Yes, Y, No, no, and NO). Correct this problem by converting the values into the appropriate class labels (Yes and No). [20 points] Store the clean data in a CSV file. [10 points) Country Age salary Purchased France 44 72000 NO Spain 27 NA Yes Germany 30 30 54000 NO Spain 38 61000 NO Germany 40 NA Yes France 35 58000 Yes Spain NA 52000 no France 48 79000 Yes Germany 50 83000 NO France NA 67000 Y Generate a summary of missing values and inconsistent values for each of the features. Your script should generate a table similar to the one shown below: Features Missing % of MV Inconsistency % of IV Values (MV) (MV) Values (IV) (IV) Country Age Salary Purchased n = number of records. An example of a missing value is NA in record 2 and the feature Salary. Another example of a missing value is for record 7 and the feature age An example of inconsistency is in record 10 and the feature purchased (Y for Yes). [20 points) Handle the missing values. Specifically, estimate missing values of age by computing the feature mean grouped by country. Similarly, estimate missing values of Salary by computing the feature mean grouped by country. You may also use the target feature for the missing value estimation. Please mention in your script what strategy do you to handle the missing values. [20 points] Correct the data inconsistency issue. The target feature is a binary class (Yes and No). However, in the correct state of the date, it has 5 class labels (Yes, Y, No, no, and NO). Correct this problem by converting the values into the appropriate class labels (Yes and No). [20 points] Store the clean data in a CSV file. [10 points)Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started