Question

1 Approved Answer

Posted on Mar 13, 2024

Criteria Points Define the problem and perform Exploratory Data Analysis - Problem definition - Check shape, Data types, and statistical summary - Univariate analysis

Criteria	Points
Define the problem and perform Exploratory Data Analysis - Problem definition - Check shape, Data types, and statistical summary - Univariate analysis - Multivariate analysis - Use appropriate visualizations to identify the patterns and insights - Key meaningful observations on individual variables and the relationship between variables	6
Data Pre-processing Prepare the data for modelling: - Outlier Detection(treat, if needed)) - Encode the data - Data split - Scale the data (and state your reasons for scaling the features)	2
Model Building - Metrics of Choice (Justify the evaluation metrics) - Model Building (KNN, Naive bayes, Bagging, Boosting) - Metrics of Choice (Justify the evaluation metrics) - Model Building (KNN, Naive bayes, Bagging, Boosting)	10
Model Performance evaluation - Check the confusion matrix and classification metrics for all the models (for both train and test dataset) - ROC-AUC score and plot the curve - Comment on all the model performance	8
Model Performance improvement - Improve the model performance of bagging and boosting models by tuning the model - Comment on the model performance improvement on training and test data	9
Final Model Selection - Compare all the model built so far - Select the final model with the proper justification - Check the most important features in the final model and draw inferences.	4
Actionable Insights & Recommendations - Compare all four models - Conclude with the key takeaways for the business	6
Problem 2 - Define the problem and Perform Exploratory Data Analysis -Problem Definition - Find the number of Character, words & sentences in all three speeches	3
Problem 2 - Text cleaning - Stopword removal - Stemming - find the 3 most common words used in all three speeches	3
Problem 2 - Plot Word cloud of all three speeches - Show the most common words used in all three speeches in the form of word clouds	3

Problem 1Context

CNBE, a prominent news channel, is gearing up to provide insightful coverage of recent elections, recognizing the importance of data-driven analysis. A comprehensive survey has been conducted, capturing the perspectives of 1525 voters across various demographic and socio-economic factors. This dataset encompasses 9 variables, offering a rich source of information regarding voters' characteristics and preferences.

Objective

The primary objective is to leverage machine learning to build a predictive model capable of forecasting which political party a voter is likely to support. This predictive model, developed based on the provided information, will serve as the foundation for creating an exit poll. The exit poll aims to contribute to the accurate prediction of the overall election outcomes, including determining which party is likely to secure the majority of seats.

Data Description

vote: Party choice: Conservative or Labour
age: in years
economic.cond.national: Assessment of current national economic conditions, 1 to 5.
economic.cond.household: Assessment of current household economic conditions, 1 to 5.
Blair: Assessment of the Labour leader, 1 to 5.
Hague: Assessment of the Conservative leader, 1 to 5.
Europe: an 11-point scale that measures respondents' attitudes toward European integration. High scores represent 'Eurosceptic' sentiment.
political.knowledge: Knowledge of parties' positions on European integration, 0 to 3.
gender: female or male.

Problem 2

In this particular project, we are going to work on the inaugural corpora from the nltk in Python. We will be looking at the following speeches of the Presidents of the United States of America:

President Franklin D. Roosevelt in 1941
President John F. Kennedy in 1961
President Richard Nixon in 1973

Code Snippet to extract the three speeches:

"
import nltk
nltk.download('inaugural')
from nltk.corpus import inaugural
inaugural.fileids()
inaugural.raw('1941-Roosevelt.txt')
inaugural.raw('1961-Kennedy.txt')
inaugural.raw('1973-Nixon.txt')
"

If the above code doesn't work, use data: Speeches

Please help me with the full Python code for the above questions with the exact data file attached to the drive

Drive Link: https://drive.google.com/drive/u/3/folders/1nn8eYYidcP32H1WAHX3okbISMTKbTl3m