Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Support Vector Machine 1. Create a python file named mysvm.py. Import the following packages. import numpy as np import matplotlib.pyplot as plt from sklearn import

Support Vector Machine

1. Create a python file named mysvm.py. Import the following packages.

import numpy as np

import matplotlib.pyplot as plt

from sklearn import preprocessing

from sklearn.svm import LinearSVC

from sklearn.svm import SVC

from sklearn.multiclass import OneVsOneClassifier

from sklearn import model_selection

2. We use a dataset that contains more than ten personal attributes to predict whether a persons income could exceed $50K per year. Download the dataset svm_income_data.txt and add the following lines. Answer the questions: (1) what is the printout? (2) what is the purpose of this code segment? (3) what is q_count for? (4) Is the line that contains ? included in the X?

input_file = 'svm_income_data.txt'

X = []

y = []

count_class1 = 0

count_class2 = 0

max_datapoints = 5000

q_count = 0

with open(input_file, 'r') as f:

for line in f.readlines():

if count_class1 >= max_datapoints and count_class2 >= max_datapoints:

break

if '?' in line:

q_count += 1

continue

data = line[:-1].split(', ')

if data[-1] == '<=50K' and count_class1 < max_datapoints:

X.append(data)

count_class1 += 1

if data[-1] == '>50K' and count_class2 < max_datapoints:

X.append(data)

count_class2 += 1

print(count_class1, count_class2, q_count)

3. Add the following lines and run the program. (1) What error is displayed? (2) After commenting the error line, what is the printout?

print(X.shape)

X = np.array(X)

print(X.shape)

4. Add the following lines and run the program. (1) What is X[0] about? Add a print line to show X[0]. (2) What role does if item.isdigit(): play? (3) What is the return value of preprocessing.LabelEncoder()? (4) What does fit_transform(X[:, i]) do?

label_encoder = []

X_encoded = np.empty(X.shape)

for i,item in enumerate(X[0]):

if item.isdigit():

X_encoded[:, i] = X[:, i]

else:

label_encoder.append(preprocessing.LabelEncoder())

X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])

5. Add a few lines in your program to print out the first 4 rows stored in X_encoded and X respectively. What is your printout?

6. Add the following lines and run your program. Please print out the first 4 rows stored in X. (1) What is your printout? (2) What is astype(int) for?

X = X_encoded[:, :-1].astype(int)

y = X_encoded[:, -1].astype(int)

7. Add the following lines and run your program. (1) What is OneVsOneClassifier for? (2) What is one-vs-one strategy in SVM?

classifier = OneVsOneClassifier(LinearSVC(random_state=0))

classifier.fit(X, y)

8. Add the following lines and run your program. (1) Can you use classifier = LinearSVC(random_state=0) to replace the corresponding line below? (2) Try it and see the difference between the results. (3) Explain why?

X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=5)

classifier = OneVsOneClassifier(LinearSVC(random_state=0))

classifier.fit(X_train, y_train)

y_test_pred = classifier.predict(X_test)

f1 = cross_validation.cross_val_score(classifier, X, y, scoring='f1_weighted', cv=3)

print("F1 score: " + str(round(100*f1.mean(), 2)) + "%")

9. What is the difference between LinearSVC() and SVC()?

10. Add the following lines and run the program. What is the output?

input_data = [['52', 'Self-emp-not-inc', '209642', 'HS-grad', '9', 'Married-civ-spouse', 'Exec-managerial', 'Husband', 'White', 'Male', '0', '0', '45', 'United-States'], \

['42', 'Private', '159449', 'Bachelors', '13', 'Married-civ-spouse', 'Exec-managerial', 'Husband', 'White', 'Male', '5178', '0', '40', 'United-States']]

input_data = np.array(input_data)

input_data_encoded = np.empty(input_data.shape)

count = 0

for i, item in enumerate(input_data[0]):

if item.isdigit():

input_data_encoded[:,i] = input_data[:,i]

else:

input_data_encoded[:,i] = label_encoder[count].transform(input_data[:,i])

count += 1

input_data_encoded = input_data_encoded.astype(int)

predicted_class = classifier.predict(input_data_encoded)

print(predicted_class)

print(label_encoder[-1].inverse_transform(predicted_class))

11. Explain what the above code is about?

12. What is the purpose of input_data_encoded = np.array(input_data_encoded)?

13. Given a dataset, if you choose Support Vector Machine as a tool, please consider how to design a Python program following the above example? Briefly give some main steps you plan to do?

14. Can you criticize any drawbacks in the above example? Briefly write down your critique.