Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Support Vector Machine 1. Create a python file named mysvm.py. Import the following packages. import numpy as np import matplotlib.pyplot as plt from sklearn import

Support Vector Machine

1. Create a python file named mysvm.py. Import the following packages.

import numpy as np

import matplotlib.pyplot as plt

from sklearn import preprocessing

from sklearn.svm import LinearSVC

from sklearn.svm import SVC

from sklearn.multiclass import OneVsOneClassifier

from sklearn import model_selection

2. We use a dataset that contains more than ten personal attributes to predict whether a persons income could exceed $50K per year. Download the dataset svm_income_data.txt and add the following lines. Answer the questions: (1) what is the printout? (2) what is the purpose of this code segment? (3) what is q_count for? (4) Is the line that contains ? included in the X?

input_file = 'svm_income_data.txt'

X = []

y = []

count_class1 = 0

count_class2 = 0

max_datapoints = 5000

q_count = 0

with open(input_file, 'r') as f:

for line in f.readlines():

if count_class1 >= max_datapoints and count_class2 >= max_datapoints:

break

if '?' in line:

q_count += 1

continue

data = line[:-1].split(', ')

if data[-1] == '<=50K' and count_class1 < max_datapoints:

X.append(data)

count_class1 += 1

if data[-1] == '>50K' and count_class2 < max_datapoints:

X.append(data)

count_class2 += 1

print(count_class1, count_class2, q_count)

3. Add the following lines and run the program. (1) What error is displayed? (2) After commenting the error line, what is the printout?

print(X.shape)

X = np.array(X)

print(X.shape)

4. Add the following lines and run the program. (1) What is X[0] about? Add a print line to show X[0]. (2) What role does if item.isdigit(): play? (3) What is the return value of preprocessing.LabelEncoder()? (4) What does fit_transform(X[:, i]) do?

label_encoder = []

X_encoded = np.empty(X.shape)

for i,item in enumerate(X[0]):

if item.isdigit():

X_encoded[:, i] = X[:, i]

else:

label_encoder.append(preprocessing.LabelEncoder())

X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

5. Add a few lines in your program to print out the first 4 rows stored in X_encoded and X respectively. What is your printout?

6. Add the following lines and run your program. Please print out the first 4 rows stored in X. (1) What is your printout? (2) What is astype(int) for?

X = X_encoded[:, :-1].astype(int)

y = X_encoded[:, -1].astype(int)

7. Add the following lines and run your program. (1) What is OneVsOneClassifier for? (2) What is one-vs-one strategy in SVM?

classifier = OneVsOneClassifier(LinearSVC(random_state=0))

classifier.fit(X, y)

8. Add the following lines and run your program. (1) Can you use classifier = LinearSVC(random_state=0) to replace the corresponding line below? (2) Try it and see the difference between the results. (3) Explain why?

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=5)

classifier = OneVsOneClassifier(LinearSVC(random_state=0))

classifier.fit(X_train, y_train)

y_test_pred = classifier.predict(X_test)

f1 = cross_validation.cross_val_score(classifier, X, y, scoring='f1_weighted', cv=3)

print("F1 score: " + str(round(100*f1.mean(), 2)) + "%")

9. What is the difference between LinearSVC() and SVC()?

10. Add the following lines and run the program. What is the output?

input_data = [['52', 'Self-emp-not-inc', '209642', 'HS-grad', '9', 'Married-civ-spouse', 'Exec-managerial', 'Husband', 'White', 'Male', '0', '0', '45', 'United-States'], \

['42', 'Private', '159449', 'Bachelors', '13', 'Married-civ-spouse', 'Exec-managerial', 'Husband', 'White', 'Male', '5178', '0', '40', 'United-States']]

input_data = np.array(input_data)

input_data_encoded = np.empty(input_data.shape)

count = 0

for i, item in enumerate(input_data[0]):

if item.isdigit():

input_data_encoded[:,i] = input_data[:,i]

else:

input_data_encoded[:,i] = label_encoder[count].transform(input_data[:,i])

count += 1

input_data_encoded = input_data_encoded.astype(int)

predicted_class = classifier.predict(input_data_encoded)

print(predicted_class)

print(label_encoder[-1].inverse_transform(predicted_class))

11. Explain what the above code is about?

12. What is the purpose of input_data_encoded = np.array(input_data_encoded)?

13. Given a dataset, if you choose Support Vector Machine as a tool, please consider how to design a Python program following the above example? Briefly give some main steps you plan to do?

14. Can you criticize any drawbacks in the above example? Briefly write down your critique.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Essential SQLAlchemy Mapping Python To Databases

Authors: Myers, Jason Myers

2nd Edition

1491916567, 9781491916568

More Books

Students also viewed these Databases questions

Question

In an Excel Pivot Table, how is a Fact/Measure Column repeated?

Answered: 1 week ago