Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Question 1-K-Nearest Neighbors (KNN): (R studio) In this question, you will use the K-Nearest Neighbors (KNN) algorithm to predict whether the S&P 500 will go

Question 1-K-Nearest Neighbors (KNN): (R studio) In this question, you will use the K-Nearest Neighbors (KNN) algorithm to predict whether the S&P 500 will go up the next day or not. The dataset that you will use for this question is named Smarket and is part of the package ISLR. Install and load package ISLR. Now you have access to the package ISLR which contains the Smarket dataset. Run the following command to view the dataset: View(Smarket) Pass data frame Smarket to the function colnames. This should give you the column names of the Smarket dataframe. The results should be similar to the following: "volume" "Lag]" "Today" "Lag2" "Lag3" "Direction" "Lag4" "Lag5" Pass column Year of the data frame Smarket to the function unique. This should give you the unique values in the column Year which should be as follows: [1] 2001 2002 2003 2004 2005 Read pages 12 and 13 of the ISL documentation available at https://cran.r- project.org/web/packages/ISLR/ISLR.pdf to familiarize yourself more with this data frame. We would like to predict the Direction column using the columns Lag1, Lag2, Lag3, Lag4 , Lag5, and Volume. Divide the Smarket into two: 2001 through 2004 will be training data, and 2005 will be test data. Name the training dataset Smarket_train and test dataset Smarket test.

Create a subset of the Smarket_train data set which only includes columns 2 to 7 (Lag1, Lag?, Lag3, Lag4, Lag5, and Volume) and name it Smarket_train x.

Use the scale function to z-score standardize the Smarket_train data frames. The output of the scale function is a matrix. Don't forget to convert it to a dataframe using the as.data.frame function. Save the standardized Smarket on a variable named Smarket_train_z. Repeat the previous two steps on the Smarket test datasets, as well. You can name the outputs of these steps Smarket_ test _x and Smarket test z. Create a subset of the Smarket_train data set which only includes the Direction column and name it Smarket train label. Do the same task for the Smarket train dataset and name it Smarket_train_label. Run knn function, using k = 1, to obtain predictions for the test data. Check the accuracy of your prediction using the CrossTable function. Repeat the previous two tasks using k = 3 and k = 5. Which k has the highest accuracy?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions