Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please use R Programming for this question and take screenshots of R Code once finished. Data to use for question 1: https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing Use the starwars

Please use R Programming for this question and take screenshots of R Code once finished.

Data to use for question 1: https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing

Use the starwars data set using the dplyr package in question 2

image text in transcribedimage text in transcribed
Question 1: For this problem, Ivou will load and perform some cleaning steps on a dataset in the provided EiankData.csv, which is data about loan approvals from a bank in Japan {it has been modied from the original for our purposes in class, so use the provided version]. Specificallv, vou will use visualization to examine the variables and normalization, binning and smoothing to change them in particular wavs. a. 'v'isualize the distributions of the variables in this data. You can choose bar graphs, histograms and densitv plots. Make appropriate choices given each type of variables and be careful when selecting parameters like the number of bins for the histograms. Note there are some numerical variables and some categorical ones. The ones labeled as a \"bool' are Boolean variables, meaningthev are onlv true or false and are thus a special tvpe of categorical. Checking all the distributions with visualization and summarvstatistics is a tvpical step when beginning to work with new data. b. How applv normalization to some of these numerical distributions. Specificallv, choose to applv mm one, min-max to another, and decimal scaling to a third. Explain vour choices of which normalization applies to which variable in terms of what the variable means, what distribution it starts with, and how the normalization will affect it. c. \"visualize the new distributions for the variables that have been normalized. wh at has changed from the previous visualization in step a? d. For contl, create a new variable called contl_bins that is a binned version of that variable. This contl_bins will have a new set of values like low, medium, high. Low ranges from -|nf to 25, Medium ranges from 25 to 4-D, and High ranges from 4D to Inf. Show this binned version contl_bins along with the other data from the dataset. Assign numerical values to the bins using the bin-mean and show the result. e. Building on {cl}, use contl_bins to create a smoothed version of contl and displayI the new distribution. How is this new distribution different than the previous distribution for contl? Question 1: We will use 5m in this problem, showing how it often get: used even when the data are not suitable, by rst engineering the numerical features we need. There is a Star Wars dataset in the dehr library. Load that library and you will be able to see it {headIstarwarle a. There are some variables we will not use, so first remove lms, vehicles, starships and name. Also remove rows with missing values b. Several variables are categorical. We will use dummy variabl to make it possible for WM to use these. Show the resulting head of the dummy variables including the target column gender. c. Use SUM to predict gender and report the accuracy. First, create the dataset for 65% training and 3496 testing and a seed of 94 forthe random partitioning. d.: Given that we have so many variables, it mak sense to consider using PEA. Run FICA on the data and determine an appropriate number of components to use from the graph. Create a reduced version of the data with that number of principle components by rst nding and removing near zero variance predictors using the following code: 51.23.\" nearZero'v'arlnumeric train} filtered

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

From Calculus To Analysis

Authors: Steen Pedersen

1st Edition

3319136410, 9783319136417

More Books

Students also viewed these Mathematics questions

Question

Define Heideggers terms throwness, Mitwelt, and Umwelt.

Answered: 1 week ago

Question

Define Scientific Management

Answered: 1 week ago

Question

Explain budgetary Control

Answered: 1 week ago

Question

Solve the integral:

Answered: 1 week ago

Question

What is meant by Non-programmed decision?

Answered: 1 week ago

Question

=+3. Which factors do influence the procurement management?

Answered: 1 week ago

Question

=+1. Describe the product range in the press sector!

Answered: 1 week ago