Question
The following assignment is done in RapidMiner. Please post a screenshot of the rapidminer process as well. Chegg does not allow for for excel file
The following assignment is done in RapidMiner. Please post a screenshot of the rapidminer process as well.
Chegg does not allow for for excel file imports so I exported to a URL where you can export the excel file here: https://sheet.zoho.com/sheet/editor.do?doc=6832ab0661e0a2dabfefe689d67ea16940033876c12b5a5bf3f32e4b0e903eb29c17b85f9f101753b84cf74e98da0b76537a2a653989422c484afcf95470afdc
The data file (uva1.csv) contains more than 19000 Internet users and their demographic information. Your job is to predict gender difference using several predictors listed below. We will organize our work into subprocesses this time:
1. Data cleaning sub-process:
Remove NAs from Age and Marital Status.
Select data from three states: Michigan, New York and Alabama.
Make sure the Age column is numeric.
Dummy-code Newbie.
2. Data transformation sub-process:
Recode Divorced, Widowed, Separated, and NA of Marital Status into Other using the MAP operator.
Recode Some College into College of Educational Attainment using the MAP operator.
Recode Masters and Doctoral into Graduate Degree using the MAP operator
Recode Special, Grammar and Professional into Other using the MAP operator.
Dummy-code Education Attainment, Marital Status and Newbie.
Use the following variables for analysis:
Again, we will predict Gender.
3. Use Split Validation with 70/30 split to build a logistic regression. If you use Split Data, points will be taken off. Since there are a lot more males than females, lets use Stratified sampling. This way the proportion of males and females is preserved in the sample.
4. Answer the following questions:
Q0: Show the following screenshots
Rapidminer design view
Result logistic regression
Result confusion matrix
Result resulting dataset
Result predicted dataset (see Q5 below)
Q1: Write your complete logistic regression equation.
Q2: What are the top three predictors that differentiate the two genders? Explain your reason.
Q3: Does the model predict male better or female better? How do you know? (Hint: precision and recall in confusion matrix)
Q4 Simple Prediction: A 30 year-old divorced person with a masters degree, a value of 6 for income and a value of 5 on Years on Internet, but he/she did not answer the gender question in the questionnaire. How likely is this person a female? Show all calculations to earn points.
Age Education Attainment College Education Attainment Graduate Degree Education Attainment- High School Gender Household Income (Numeric) Marital Status Married Marital Status Single Years on Internet (numeric) Age Education Attainment College Education Attainment Graduate Degree Education Attainment- High School Gender Household Income (Numeric) Marital Status Married Marital Status Single Years on Internet (numeric)
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started