Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

You are applying for a position at the data science team of USDA and you are given data associated with determining appropriate parasite treatment of

You are applying for a position at the data science team of USDA and you are given data associated with determining appropriate parasite treatment of canines. The suggested treatment options are determined based on a logistic regression model that predicts if the canine is infected with a parasite. The data is given in the site: https://data.world/ehales/grls-parasite-study/workspace/file?filename=CBC_data.csv and more specifically in the CBC_data.csv file. Login using you University Google account to access the data and the description that includes a paper on the study (you dont need to read the paper to solve this problem). Your target variable column is titled parasite_status.

Question 1 - Feature Engineering (5 points) In this step you outline the following as potential features (this is a limited example - we can have many features as in your programming exercise below). Write the posterior probability expressions for logistic regression for the problem you are given to solve. (=1|,)= (=0|,)=

Question 2 - Decision Boundary (5 points) Write the expression for the decision boundary assuming that (=1)=(=0) . The decision boundary is the line that separates the two classes.

Type Markdown and LaTeX: 2

Question 3 - Loss function (5 points) Write the expression of the loss as a function of that makes sense for you to use in this problem. NOTE: The loss will be a function that will include this function: ()=11+ =

Question 4 - Gradient (5 points) Write the expression of the gradient of the loss with respect to the parameters - show all your work. =

Question 5 - Imbalanced dataset (10 points) You are now told that in the dataset (=0)>>(=1) Can you comment if the accuracy of Logistic Regression will be affected by such imbalance?

Type Markdown and LaTeX: 2

Question 6 - SGD (15 points) The interviewer was impressed with your answers and wants to test your programming skills. Use the dataset to train a logistic regressor that will predict the target variable . Report the harmonic mean of precision (p) and recall (r) i.e the metric called 1 score that is calculated as shown below using a test dataset that is 20% of each group. Plot the 1 score vs the iteration number . 1=21+1 Your code includes hyperparameter optimization of the learning rate and mini-batch size.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Database Systems

Authors: Ramez Elmasri, Sham Navathe

4th Edition

0321122267, 978-0321122261

More Books

Students also viewed these Databases questions