Answered step by step

Verified Expert Solution

Link Copied!

Question

...

1 Approved Answer

Posted on Jul 08, 2024

Please use R Programming and R Studio for this question. Link to the file of the data for this question: https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing Problem 1 (12 poi

Please use R Programming and R Studio for this question.

Link to the file of the data for this question: https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing

Problem 1 (12 poi nts}: For this question, we will use the US census dataset from 1994, which is in adultcsv. a. Show the descriptive and summary statistics for this dataset. Based on those metrics, what can we say:I about the distribution of age and educationnun-ii'I Hint: Create a histogram [4 points] b. How create a scatterplot matrix of the numerical variables. Are there anv strong correlations between anv two variables? If so, what are thev? Hint: Refer to Tutorial 2 scatterplot section. [2 points] c. Based on descriptive and summary! statistics and box plots for age, educationnum and hoursper week, are there anv differences between males and females? Hint: Use the "lter\" function from the Tidvverse to create males and females subsets, and create box plot following Tutorial 2 ['5 points} Question 2: We will use SVM in this problem, showing how it often gets used even when the data are not suitable, by first engineering the numerical features we need. There is a Star Wars dataset in the dplgr library. Load that library and you will be able to see it {headlstarwarsl}. a. There are some variables we will not use, so first remove films, vehicles, starshjpg and name. Also remove rows with missing values b. Several variables are categorical. We will use dummy variables to make it possible for SVM to use these. Show the resulting head of the dummy variables including the target column gender. c. Use SUM to predict gender and report the accuracy. First, create the dataset for 66% training and 34% testing and a seed of 94 for the random partitioning. d: Given that we have so many variables, it makes sense to consider using PCA. Run PCA on the data and determine an appropriate number of components to use from the graph. Create a reduced version of the data with that number of principle components by first finding and removing near zero variance predictors using the following code: nzv c nearZeroVar{numeric train] W W filtered

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Economics

Authors: R. Glenn Hubbard

6th edition

978-0134106243

Students also viewed these Mathematics questions

Question

What is Simpsons paradox?

Answered: 1 week ago

Question

★★★★★

Data for Granger Inc. are presented in P12-9A. Further analysis reveals that accounts payable pertain to merchandise creditors. In P12-9A, Condensed financial data of Granger Inc. follow. Additional...

Answered: 1 week ago

Question

★★★★★

can someone please help me Use the mortgage calculator at http://www dinkytown net/javalMortgageLoan html to answer the following questions. Make sure the report amortization is set to "annually"....

Answered: 1 week ago

Question

★★★★★

Quantitative Problem: Rosnan Industries' 2019 and 2018 balance sheets and income statements are shown below. Balance Sheets 2019 2018 $100 275 375 $85 300 250 $635 1,490 $2,125 $750 2,300 $3,050...

Answered: 1 week ago

Question

★★★★★

A table below shows the willingness-to-pay for a single-short cappuccino of five consumers as well as the cost of selling a single-short cappuccino for five coffee shops. For simplicity, assume that...

Answered: 1 week ago

Question

★★★★★

The gra a quadratic function with vertex (0, 1) is shown in the Find the unge and the domain. 10 8 -2 24 6 8.10 Write the range and domain using interval notation. (a) range: (0,0) [0,0] (0,0)

Answered: 1 week ago

Question

★★★★★

Consider the equation z16 (1+ 3i). Find the value of z which satisfies this equation and which has the second smallest positive argument 0,0 Answered: 1 week ago

Answered: 1 week ago

Question

★★★★★

Coburn (beginning capital, $57,000) and Webb (beginning capital $93,000) are partners. During 2022, the partnership earned net income of $65,000, and Coburn made drawings of $19,000 while Webb made...

Answered: 1 week ago

Question

★★★★★

Your company has an Azure subscription. You plan to create a virtual machine scale set named VMSS1 that has the following settings: Resource group name: RG1 Region: West US Orchestration mode:...

Answered: 1 week ago

Question

★★★★★

Lila invested $10,000 in one of Long Life Insurance Company's annuity contracts. When issued, the contract was paying a 5 percent rate of return. Two years later, Long Life increased this rate to 7...

Answered: 1 week ago

Previous Question Next Question