Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Last Name, First Name Desc: This notebook serves as a template for binary classification problem. In [1]: ) # import packages import pandas as pd

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
Last Name, First Name Desc: This notebook serves as a template for binary classification problem. In [1]: ) # import packages import pandas as pd import seaborn as sns from sklearn import metrics from sklearn. metrics import confusion_matrix, classification_report, roc_auc_score import matplotlib. pyplot as plt from IPython. display import Image from io import StringIO %matplotlib inline 1. Import dataset In [2]: stocks = pd. read_csv( 'Weekly. csv', na_values='?' ) . dropna( ) print(stocks . info( ) ) Int64Index: 1089 entries, 0 to 1088 Data columns (total 10 columns) # Column Non-Null Count Dtype - - - Unnamed: 0 1089 non-null int64 Year 1089 non-null int64 Lag 1089 non-null float64 Lag2 1089 non-null float64 JOURWNY Lags 1089 non-null float64 Lag4 1089 non-null float64 Lag5 1089 non-null float64 Volume 1089 non-null float64 Today 1089 non-null float64 Direction 1089 non-null object dtypes: float64(7), int64(2), object(1) memory usage: 93. 6+ KB NoneJupyter Hw_ModelEvaluation_Template (autosaved) Logout File Edit View Insert Cell Kernel Widgets Help Not Trusted Python 3 + Run C Code V Eng In [3]: I stocks. tail() Out [ 3 ] : Unnamed: 0 Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction 1084 1085 2010 -0.861 0.043 -2.173 3.599 0.015 3.205160 2.969 JF 1085 1086 2010 2.969 -0.861 0.043 -2.173 3.599 4.242568 1.281 Up 1086 1087 1.281 2.969 -0.861 0.043 -2.173 4.835082 0.283 Up 1087 1088 2010 0.283 1.281 2.969 -0.861 0.043 4.454044 1.034 Up 1088 1089 2010 1.034 0.283 1.281 2.969 -0.861 2.707105 0.069 Up In [4]: ) # convert categorical variables to dummy variables (0/1) stocks_up = pd. get_dummies (stocks [ 'Direction' ]) # Join the dummy variables to the main dataframe stocks_new = pd . concat ( [stocks, stocks_up], axis=1) stocks_new. head ( ) Out [4 ] : Unnamed: 0 Year Lag1 Lag2 Lag3 Lag4 Lag5 Volume Today Direction Down Up 1 1990 0.816 1.572 -3.936 -0.229 -3.484 0.154976 -0.270 Down 0 2 1990 -0.270 0.816 1.572 -3.936 -0.229 0.148574 -2.576 Down 1 0 N - 1990 -2.576 -0.270 0.816 1.572 -3.936 0. 159837 3.514 Up 0 1 3 1990 3.514 -2.576 -0.270 0.816 1.572 0.161630 0.712 Up 0 1 A 5 1990 0.712 3.514 -2.576 -0.270 0.816 0.153728 1.178 Up 1 Q1 After dropping the missing values (if any), what percentage of observations in the sample has Direction = "Up"? In [5]: ) # type your anwser for Q1 hereFile Edit View Insert Cell Kernel Widgets Help Not Trusted Python 3 O + Run C Code Q2 Randomly split the dataset, so that the training dataset includes 60% of the original dataset. In [6]: ) from sklearn. model_selection import train_test_split x_columns = ['Year', 'Lagl', 'Lag2', 'Lag3' , ' Lag4', 'Lag5' , 'Volume' ] X = stocks_new[x_columns ] y = stocks_new[ ' Up' ] # Q2: split the dataset Q3 Fit a logistic regression model with Direction as the response variable and the five lag variables plus Volume as the predictors. In [7]: ) from sklearn. linear_model import LogisticRegression # Q3: build the logit regression model, using the training dataset Q4 Generate a predicted value for each week in the test data. Indicate the distribution of the predicted label. K : [ ] UI1: Jupyter Hw_ModelEvaluation_Template (autosaved) I. Logout File Edit View Insert Cell Kernel Widgets Help Not Trusted l Python 3 O + BK Eh E 4 iv D Run I C' Code V Q5 Compute the confusion matrix and the accuracy score for the test data, i_e_, the rest 40% of the full dataset. In [8]: H # 95: confusion matrix, caLcuLate accuracy In: H 06 Create a decision tree model with the same X and y variables as you did for the logistic model. Set max_depth to 3 in this tree model. In: H Q7 Print out the image of the tree models. Briey describe the tree according to the image. In: H : Jupyter Hw_ModelEvaluation_Template Last Checkpoint: 23 minutes ago (autosaved) F Logout File Edit View Insert Cell Kernel Widgets Help Trusted | Python 3 O 'l- 8% til E 4 VI! bRun I C' Markdown V GS Print out the variable importance scores. Describe which variable is more important in this prediction. In: H Q9 What percentage of all observations is being correctly predicted in the test data set by the decision tree? In: H Q10 In the test data set, consider only those observations for which the actual value of the target variable equals 1, Up = 1. What percentage of these observations is being correctly predicted by the decision tree? In= H

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Great Convergence Information Technology And The New Globalization

Authors: Richard Baldwin

1st Edition

067466048X, 9780674660489

More Books

Students also viewed these Economics questions

Question

L A -r- P[N]

Answered: 1 week ago