Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Q2. Is it a conference announcement? The DBWorld e-mails data set contains 64 e-mails collected from DBWorld mailing list, classified into two classes: conference announcements

image text in transcribedimage text in transcribed

Q2. Is it a conference announcement? The DBWorld e-mails data set contains 64 e-mails collected from DBWorld mailing list, classified into two classes: "conference announcements" and "everything else". The data has 64 instances and 4702 attributes (note that the number of instances is much smaller than the number of attributes). Each word in the vocabulary of the e-mail collection defines an attribute. Study the description of the data set and plan how to convert the data file into a plain csv file that you can read into an ndarray in sklearn. You may want to consider the loadarff reader in scipy. b) Learn a MultinomialNB classification model on the dataset. Use 3-fold cross validation to evaluate the performance of the classifier c) Read the overview of Ensemble methods and use a Bagging classifier built with the MultinomialNB classifier as the base estimator. The Bagging classifier is quite powerful, as it allows sampling both instances of the labelled data, as well as features (attributes). Experiment with different numbers of base estimators, numbers of samples to draw to train each base estimate, and numbers of features to draw to train each base estimator. Evaluate your choices of hyperparameters using 3-fold cross validation. Use the default values for the remaiing BaggingClassifier hyperparameters S. d) Summarize your findings from parts (b), and (c). Which classifier and hyperparameter values per- formed best? Q2. Is it a conference announcement? The DBWorld e-mails data set contains 64 e-mails collected from DBWorld mailing list, classified into two classes: "conference announcements" and "everything else". The data has 64 instances and 4702 attributes (note that the number of instances is much smaller than the number of attributes). Each word in the vocabulary of the e-mail collection defines an attribute. Study the description of the data set and plan how to convert the data file into a plain csv file that you can read into an ndarray in sklearn. You may want to consider the loadarff reader in scipy. b) Learn a MultinomialNB classification model on the dataset. Use 3-fold cross validation to evaluate the performance of the classifier c) Read the overview of Ensemble methods and use a Bagging classifier built with the MultinomialNB classifier as the base estimator. The Bagging classifier is quite powerful, as it allows sampling both instances of the labelled data, as well as features (attributes). Experiment with different numbers of base estimators, numbers of samples to draw to train each base estimate, and numbers of features to draw to train each base estimator. Evaluate your choices of hyperparameters using 3-fold cross validation. Use the default values for the remaiing BaggingClassifier hyperparameters S. d) Summarize your findings from parts (b), and (c). Which classifier and hyperparameter values per- formed best

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Relational Database And Transact SQL

Authors: Lucy Scott

1st Edition

1974679985, 978-1974679980

More Books

Students also viewed these Databases questions

Question

Write the equation giving the general relation between n and m .

Answered: 1 week ago