Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please use R Programming for these questions. Here is the link to the data file for Question 1: Bank Data:https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing For Question 2: Use the

Please use R Programming for these questions.

Here is the link to the data file for Question 1: Bank Data:https://drive.google.com/file/d/18kGNrHUfgcVv2hMKqL5E05L40xCl6e1M/view?usp=sharing

For Question 2: Use the star wars dataset that is in the dplyr package in R Studio.

image text in transcribed
huesb'on 1: For this problem. you will load and perform some cleaning steps on a dataset in the provided BankData.csv, which is data about loan approvals from a bank in Japan (it has been modied from the original for our purposes in class, so use the provided version]. Specically, you will use visualization to examine the variables and normalization, binning and smoothing to change them in particular ways. a. Visualize the distributions ofthe variables in this data. You can choose bar graphs, histograms and density plots. Make appropriate choices given each type of variables and be careful when selecting parameters like the number of bins forthe histograms. Note there are some numerical variables and some categorical ones. The variable in the BankDATA.csv data set labeled as a 'bool' are Boolean variables, meaning they are onlytrue or false and are thus a special type of categorical. Checking all the distributions with visualization and summary statistics is a typiml step when beginning to work with new data. b. Now apply normalization to some of these numerical distributions. Specifically, choose to applyz-soore to one, min-max to another, and decimal smilingI to athird. Explain your choices of which normalization applies to which variable in terms of what the variable means. what distribution it starts with, and how the normalization will affect it. c. Visualize the newI distributions forthe variables that have been normalized. What has changed from the previous visualization in step a? d. For oontl, create a new variable called oontlibins that is a binned version of that variable. This oont1_bins will have a new set ofvalues like low, medium, high. Low ranges from -lnfto 25, Medium ranges from 25 to 40, and High ranges from an to Inf. Show this binned version contlibins along with the other data from the dataset. Assign numerical values to the bins using the bin-mean and show the result. e. Building on (d), use cont1_bins to create a smoothed version of oont] and display the new distribution. How is this new distribution different than the previous disb'ibution for oontl? Question 2: We will use SVM (support vector machine) inthis problem, showing how it often gets used even when the data are not suitable, by rst engineering the numerical features we need. There is a Star Wars dataset in the dplyr library. load that library and you will be able to see it [headlstarwars]]. a. There are some variables we will not use, sorst remove lms, vehicles, starshios and name. Also remove rows with missing values b. Several variables are (allegorical. We will use dummy variables to make it possible for SVM to use these. Show the resulting head of the dummy variables including the target oolumn gender. c. Use SVM to predict gender and report the accuracy. FIISL create the dataset for 66% training and 34% testing and a seed of 54 for the random partitioning. d.: Given that we have so many variables, it makes sense to consider using PCA [Principal Component Analysis). Run FCA on the data and determine an appropriate number of components to use from the graph. Create a reduced version of the data with that numberof principle components by rst nding and removing near zero variance predictors using the following code: nzv

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

A First Course In Differential Equations

Authors: J David Logan

3rd Edition

3319178520, 9783319178523

More Books

Students also viewed these Mathematics questions

Question

What is concealment?

Answered: 1 week ago

Question

25.0 m C B A 52.0 m 65.0 m

Answered: 1 week ago

Question

7. How can an interpreter influence the utterer (sender)?

Answered: 1 week ago