Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The following assignment is done in RapidMiner. Please post a screenshot of the rapidminer process as well. Chegg does not allow for for excel file

The following assignment is done in RapidMiner. Please post a screenshot of the rapidminer process as well.

Chegg does not allow for for excel file imports so I exported to a URL where you can export the excel file here: https://sheet.zoho.com/sheet/editor.do?doc=6832ab0661e0a2dabfefe689d67ea16940033876c12b5a5bf3f32e4b0e903eb29c17b85f9f101753b84cf74e98da0b76537a2a653989422c484afcf95470afdc

The data file (uva1.csv) contains more than 19000 Internet users and their demographic information. Your job is to predict gender difference using several predictors listed below. We will organize our work into subprocesses this time:

1. Data cleaning sub-process:

Remove NAs from Age and Marital Status.

Select data from three states: Michigan, New York and Alabama.

Make sure the Age column is numeric.

Dummy-code Newbie.

2. Data transformation sub-process:

Recode Divorced, Widowed, Separated, and NA of Marital Status into Other using the MAP operator.

Recode Some College into College of Educational Attainment using the MAP operator.

Recode Masters and Doctoral into Graduate Degree using the MAP operator

Recode Special, Grammar and Professional into Other using the MAP operator.

Dummy-code Education Attainment, Marital Status and Newbie.

Use the following variables for analysis:image text in transcribed

Again, we will predict Gender.

3. Use Split Validation with 70/30 split to build a logistic regression. If you use Split Data, points will be taken off. Since there are a lot more males than females, lets use Stratified sampling. This way the proportion of males and females is preserved in the sample.

4. Answer the following questions:

Q0: Show the following screenshots

Rapidminer design view

Result logistic regression

Result confusion matrix

Result resulting dataset

Result predicted dataset (see Q5 below)

Q1: Write your complete logistic regression equation.

Q2: What are the top three predictors that differentiate the two genders? Explain your reason.

Q3: Does the model predict male better or female better? How do you know? (Hint: precision and recall in confusion matrix)

Q4 Simple Prediction: A 30 year-old divorced person with a masters degree, a value of 6 for income and a value of 5 on Years on Internet, but he/she did not answer the gender question in the questionnaire. How likely is this person a female? Show all calculations to earn points.

Age Education Attainment College Education Attainment Graduate Degree Education Attainment- High School Gender Household Income (Numeric) Marital Status Married Marital Status Single Years on Internet (numeric) Age Education Attainment College Education Attainment Graduate Degree Education Attainment- High School Gender Household Income (Numeric) Marital Status Married Marital Status Single Years on Internet (numeric)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

25 Vba Macros For Data Analysis In Microsoft Excel

Authors: Klemens Nguyen

1st Edition

B0CNSXYMTC, 979-8868455629

More Books

Students also viewed these Databases questions

Question

assess the infl uence of national culture on the workplace

Answered: 1 week ago