Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Problem 1 : Perceptron Learning ( 1 5 marks ) The dataset lab 0 2 _ dataset _ 1 . csv has a 3 -

Problem

1

: Perceptron Learning

(15

marks

)

The dataset lab

02_

dataset

_1 .

csv has a

3 -

dimensional input space and a class label of

Positive and Negative. For this task, you are not allowed to use any functionalities of the

sklearn module.

Write a function my

_

perceptron

()

which applies perceptron algorithm on the

dataset to create a linear separator. my

_

perceptron

()

should return a

3 -

dimensional

weight vector which can be used to create the linear separator. Use a classification

threshold of

99 %

.

.,

_

perceptron

()

will terminate once the misclassification

rate is less than

1 % . (10

marks

)

Create a

3

D plot which showcases the dataset in a

3

-

space alongwith the linear

separator you obtained from my

_

perceptron

() .

Use two different colors to represent

the data points belonging in the two classes for ease of viewing.

(5

marks

)

Problem

2

: Na

ve Bayes Learning

(25

marks

)

The dataset lab

02_

dataset

_2 .

xlsx contains

10, 302

observations on various vehicles. You

will use the observations in this dataset to train models that predict the usage of a vehicle.

Your models will use the following variables:

Output Label:

CAR

_

USE. Vehicle Usage. It has two categories, namely, Commercial and Private.

Input Features:

CAR

_

TYPE. Vehicle Type. It has six categories, namely, Minivan, Panel Truck,

Pickup, SUV, Sports Car, and Van.

OCCUPATION. Occupation of Vehicle Owner. It has nine categories, namely,

Clerical, Home Maker, Doctor, Lawyer, Manager, Professional, Blue Collar,

Student, and Unknown.

EDUCATION. Highest Education Level of Vehicle Owner. It has five categories

namely Below High Sc

,

High School, Bachelors, Masters, PhD

.

You will use only observations where there are no missing values in all the above four

variables. After dropping the missing values, you will use all the

100 %

complete

observations for training your Na

ve Bayes models using sklearn. For each observation,

you will calculate the predicted probabilities for CAR

_

USE

=

Commercial and CAR

_

USE

=

Private. You will classify the observation in the CAR

_

USE category that has the highest

predicted probability. In case of ties, choose Private category as the output.

You will train a Na

ve Bayes model with a Laplace smoothing of

0.01 . (5

marks

)

Output the Class counts and Probabilities

P (Y_{j}) .

Also display the probability of the

input variables, given each output label

P (x_{i} | Y_{j})

alongwith their counts.

(5

marks

)

Let us study a couple of fictitious persons

(

test cases

) .

One person works in a Blue

Collar occupation, has an education level of

P h D,

and owns an SUV. Another person

works in a Managert occupation, has a Below High Sc level of education, and owns

a Sports Car. What are the Car Usage probabilities of both these people?

(5

marks

)

Generate a histogram of the predicted probabilities of CAR

_

USE

=

Private. The

bin width is

0.05 .

The vertical axis is the proportion of observations.

(5

marks

)

Finally, what is the misclassification rate of the Na

ve Bayes model?

(5

marks

)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke, David Auer, Scott Vandenberg, Robert Yoder

9th Edition

★★★★★

What is the relationship between the Internal Staff Compensation Target Table and the Internal Staff Compensation Data Table?

Answered: 1 week ago

Previous Question Next Question