Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Code This section needs to be completed using Python 3.6+. You will also require following packages: pandas numpy NLTK or SpaCy
Q2. Logistic Regression: Code [25] In this task, you will learn to build a Logistic Regression Classifier for the same "Financial Phrasebank" dataset. Bag of Words model will be used for this task. 1. Use 60% of the data selected randomly for training, 20% selected randomly for testing and the remaining 20% for validation set. Use classes 'positive' and 'negative' only. Perform the same cleaning tasks on the text data and build a vocabulary of the words. Link to the dataset. Link to the scikit-learn documentation for metrics. 2 2. Using CountVectorizer3, fit the cleaned train data. This will create the bag-of-words model for the train data. Transform test and validation sets using same CountVectorizer. 3. To implement the logistic regression using following equations, z = W.x = o(z) we need the weight vector W. Create an array of dimension equal to those of each x from the CountVectorizer. 4. Apply above equations over whole training dataset and calculate y and cross-entropy loss LCE which can be calculated as LCE = -y log y + (1 - y) log(1-) 5. Now, update the weights as follows: W+1 = W - (y - yi).xi Here, (y - y).x is the gradient of sigmoid function and a = 0.01 is the learning rate. 6. Repeat step 4 and step 5 for 500 iterations or epochs. For each iteration, calculate the cross-entropy loss on validation set. 7. Calculate the accuracy and macro-average precision, recall, and F1 score and provide the confusion matrix on the test set.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started