Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 16, 2024

Q 1 . [ 5 points ] The optimization problem of training logistic regression for the binary classification problem, i . e . , Y

Q $1 . [5$ points $]$ The optimization problem of training logistic regression for the binary classification problem, i $.$ e $., Y = {0, 1},$ uses the cross $-$ entropy $($ CE $)$ loss which is defined as

$w_{C E}^{* } = a r g m i n_{w} l_{C E} (w) = a r g m i n_{w} - \frac{1}{n}_{i = 1}^{n} y_{i} l o g h a t (y)_{i} + (1 - y_{i}) l o g (1 -$ hat $(y)_{i}),$

where hat $(y) = \frac{1}{1 + e x p (- w^{T} x)} .$

One may believe that we could have used mean squared error $($ MSE $)$ as the loss function instead of the CE loss. In this case, the optimization problem for the logistic regression can be described as

$w_{M S E}^{ *} = a r g m i n_{w} l_{M S E} (w) = a r g m i n_{w} \frac{1}{n}_{i = 1}^{n} | | y_{i} -$ hat $(y)_{i} | |_{2}^{2} .$

Here, hat $(y) = \frac{1}{1 + e x p (- w^{T} x)} .$

The goal of this exercise is to compare the CE loss and the MSE loss when training a logistic regression model and to understand why we should use the CE over the MSE.

$($ a $) [2$ points $]$ Compute the first order and the second order derivatives of $l_{C E} (w) .$

$($ b $) [2$ points $]$ Compute the first order and the second order derivatives of $l_{M S E} (w) .$

$($ c $) [1$ point $]$ Using the seconder order derivatives, which are computed in $($ a $)$ and $($ b $),$ explain why we should prefer the cross $-$ entropy loss over the MSE loss for training a logistic regression model. $($ Hint $.$ Use the second derivative and check the convexity. $)$

$($ d $) [1$ point $]$ Given the gradients computed in $($ a $),$ derive the GD algorithm for optimization problems in equations $(1) .$

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Find the z-values needed to calculate one-sided confidence bounds for the confidence levels given 2. A 95% confidence bound

Answered: 1 week ago

Question

★★★★★

Jackson Leasing Co. acquired an apartment complex having some problems. The tenants complained of wall cracks, peeling paint, water leaks, heating and electrical fixture problems, broken or...

Answered: 1 week ago

Question

★★★★★

Use the information from the balance sheet and income statement in the popup? window, LOADING...?, to calculate the following? ratios: a. Current ratio b.? Acid-test ratio c. Times interest earned d....

Answered: 1 week ago

Question

★★★★★

Given the estimated demand equation and the following current values of the variables, income elasticity demand for LEE is equal to PLEE=$18 QLEE=50,000 (demand for QLEE) PLEVI=$20 PGUESS=$35 Income...

Answered: 1 week ago

Question

★★★★★

Required: Prepare all journal entries that Pintime should have entered on its books to record the business combination. Note: If no entry is required for a transaction / event , ?select " No journal...

Answered: 1 week ago

Question

★★★★★

Please help make a Journalizing with short description and T accounts and Trial Balance Recording Transactions in T-Accounts and Preparing a Trial Balance Nancy Mulles established the Mulles Data...

Answered: 1 week ago

Question

★★★★★

Sprint Manufacturing Company produces two products, X and Y. The following information is presented for both products: X Y Selling price per unit $30 $20 Variable cost per unit 20 5 Total fixed costs...

Answered: 1 week ago

Question

★★★★★

Eddie's Galleria sells billiard tables. The company has the following purchases and sales for 2024. Date January 1 March 8 August 22 October 29 January 1 to December 31 Transactions Beginning...

Answered: 1 week ago

Question

★★★★★

This data structure, working on a Hash Table that takes two keys rather than one. In terms of storage, this can be thought of as a hash table of hash tables, where a top-level key is used to...

Answered: 1 week ago

Previous Question Next Question