Question
Support Vector Machine 1. Create a python file named mysvm.py. Import the following packages. import numpy as np import matplotlib.pyplot as plt from sklearn import
Support Vector Machine
1. Create a python file named mysvm.py. Import the following packages.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.svm import LinearSVC
from sklearn.svm import SVC
from sklearn.multiclass import OneVsOneClassifier
from sklearn import model_selection
2. We use a dataset that contains more than ten personal attributes to predict whether a persons income could exceed $50K per year. Download the dataset svm_income_data.txt and add the following lines. Answer the questions: (1) what is the printout? (2) what is the purpose of this code segment? (3) what is q_count for? (4) Is the line that contains ? included in the X?
input_file = 'svm_income_data.txt'
X = []
y = []
count_class1 = 0
count_class2 = 0
max_datapoints = 5000
q_count = 0
with open(input_file, 'r') as f:
for line in f.readlines():
if count_class1 >= max_datapoints and count_class2 >= max_datapoints:
break
if '?' in line:
q_count += 1
continue
data = line[:-1].split(', ')
if data[-1] == '<=50K' and count_class1 < max_datapoints:
X.append(data)
count_class1 += 1
if data[-1] == '>50K' and count_class2 < max_datapoints:
X.append(data)
count_class2 += 1
print(count_class1, count_class2, q_count)
3. Add the following lines and run the program. (1) What error is displayed? (2) After commenting the error line, what is the printout?
print(X.shape)
X = np.array(X)
print(X.shape)
4. Add the following lines and run the program. (1) What is X[0] about? Add a print line to show X[0]. (2) What role does if item.isdigit(): play? (3) What is the return value of preprocessing.LabelEncoder()? (4) What does fit_transform(X[:, i]) do?
label_encoder = []
X_encoded = np.empty(X.shape)
for i,item in enumerate(X[0]):
if item.isdigit():
X_encoded[:, i] = X[:, i]
else:
label_encoder.append(preprocessing.LabelEncoder())
X_encoded[:, i] = label_encoder[-1].fit_transform(X[:, i])
5. Add a few lines in your program to print out the first 4 rows stored in X_encoded and X respectively. What is your printout?
6. Add the following lines and run your program. Please print out the first 4 rows stored in X. (1) What is your printout? (2) What is astype(int) for?
X = X_encoded[:, :-1].astype(int)
y = X_encoded[:, -1].astype(int)
7. Add the following lines and run your program. (1) What is OneVsOneClassifier for? (2) What is one-vs-one strategy in SVM?
classifier = OneVsOneClassifier(LinearSVC(random_state=0))
classifier.fit(X, y)
8. Add the following lines and run your program. (1) Can you use classifier = LinearSVC(random_state=0) to replace the corresponding line below? (2) Try it and see the difference between the results. (3) Explain why?
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2, random_state=5)
classifier = OneVsOneClassifier(LinearSVC(random_state=0))
classifier.fit(X_train, y_train)
y_test_pred = classifier.predict(X_test)
f1 = cross_validation.cross_val_score(classifier, X, y, scoring='f1_weighted', cv=3)
print("F1 score: " + str(round(100*f1.mean(), 2)) + "%")
9. What is the difference between LinearSVC() and SVC()?
10. Add the following lines and run the program. What is the output?
input_data = [['52', 'Self-emp-not-inc', '209642', 'HS-grad', '9', 'Married-civ-spouse', 'Exec-managerial', 'Husband', 'White', 'Male', '0', '0', '45', 'United-States'], \
['42', 'Private', '159449', 'Bachelors', '13', 'Married-civ-spouse', 'Exec-managerial', 'Husband', 'White', 'Male', '5178', '0', '40', 'United-States']]
input_data = np.array(input_data)
input_data_encoded = np.empty(input_data.shape)
count = 0
for i, item in enumerate(input_data[0]):
if item.isdigit():
input_data_encoded[:,i] = input_data[:,i]
else:
input_data_encoded[:,i] = label_encoder[count].transform(input_data[:,i])
count += 1
input_data_encoded = input_data_encoded.astype(int)
predicted_class = classifier.predict(input_data_encoded)
print(predicted_class)
print(label_encoder[-1].inverse_transform(predicted_class))
11. Explain what the above code is about?
12. What is the purpose of input_data_encoded = np.array(input_data_encoded)?
13. Given a dataset, if you choose Support Vector Machine as a tool, please consider how to design a Python program following the above example? Briefly give some main steps you plan to do?
14. Can you criticize any drawbacks in the above example? Briefly write down your critique.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started