Question
Section 1 20 Marks A data science team is working on a classification problem in which the dataset contains many correlated variables, and most of
Section 1 20 Marks A data science team is working on a classification problem in which the dataset contains many correlated variables, and most of them are categorical and continuous variables. Discuss this in the context of the mini case study at least two paragraphs per question. (Hint: 150 to 200 words in a paragraph) 1. Which classifier should the team consider using when working with only categorical variables and explain why you have selected that specific classifier? 2. Which classifier should the team consider using when working with only continuous variables and explain why the choice of that specific classifier? Section 2 50 Marks Choose a topic of your interest, such as a movie, a celebrity, or any buzzword. Then collect 50 tweets related to this topic. Hand-tag them as positive, neutral, or negative. Next, split them into 40 tweets as the training set and the remaining 10 as the testing set. Run one or more classifiers over these tweets to perform sentiment analysis. What are the precision and recall of these classifiers? Which classifier performs better than the others do? 3 BDA FA 2 Section 3 30 Marks FNB wants to establish an effective marketing campaign targeting to increase the number of clients. In addition, the bank want to analyse reasons why people do not want to join FNB anymore and why other are closing their FNB account. Again, the FNB needs to know what they can do to attract more customers. Finally, FNB wishes to build a data warehouse to support the marketing department and other related customer care group. Based on this information, you are required to create an analytic plan, which include the bellow components. 4 BDA FA 2 Marking Rubric BDA FA 2 Examiner Marks Moderator Marks Section 1 / Total Marks 20 1. Meaningful content related to the question and presented in 2 Paragraphs 10 Marks Attempted 4 Marks Not Attempted 0 2. Meaningful content related to the question and presented in 2 Paragraphs 10 Marks Attempted 4 Marks Not Attempted 0 Section 2 / Total Marks 50 1. Topic Selection- 5 Marks No Selection-0 Collection of 50 different tweets related to the topic ( per tweets) What are the precision and recall of these classifiers? Meaningful Content-10 Attempted- 4 Not Attempted 0 Which classifier performs better than the others do? Meaningful Content-10 Attempted- 4 Not Attempted 0 Section 3 / Total Marks 30 1. Meaningful content related to the question using the components of analytic plan 5 Marks per Component x 6 Total 100
Components of Analytic Plan Discovery Business Problem Framed Initial Hypotheses Data and Scope Model Planning - Analytic Technique Result and Key Findings Business Impact Components of Analytic Plan Discovery Business Problem Framed Initial Hypotheses Data and Scope Model Planning - Analytic Technique Result and Key Findings Business ImpactStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started