Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 05, 2024

l. Imbalanced data refers to a classification problem where the number of observations per class is not equally distributed . In this question we subsarnple

image text in transcribed

l. Imbalanced data refers to a classification problem where the number of observations per class is not equally distributed . In this question we subsarnple the IMDB data to crcmte imbalanced data. To subsample the data, keep all the negative observations, but only keep the first 4000 (out of the 12500} of the positive observations. Do this separately for both train and test. We end with 12500 +4000 2 16500 observations separately for training and testing . Use the p = 2500 most frequent words as predictors. (a) Fit a LASSO logistic regression model to the training data. and use 10fold cross validation using the AUC as a measure of error to tune A. For the optimal A answer the following questions. i. What are the top 5 words associated with positive reviews ? (2 points ) ii. 'What are the top 5 words associated with negative reviews ? [2 points } iii. In the training set, for each observation, using logistic regression, calculate Pr[y = 1|X = m] for the model tted using the A found from 10fold CV. For a sequence of thresholds 3 = U,0.01,U.02, 0.03 , - - - ,1, calculate the the TPR and FPR, and using these plot the ROC curve and calculate the AUC. Note that to calculate the AUC you need the area under the ROC curve. Repeat the same for the test set. Plot the R00 for the train and the test on the same graph . Also in the graph report the train and test AUG . In other words , one figure should show the ROC of the train and test, and values of the AUC. Use color coding and make sure to label the horizontal and vertical axes. (2 points ) iv. For 3 = (1.5, what is the type [and type [I error ? (2 points } v. For what 3, the type Ierror is equal {as much as possible :1 to the type II error ? (2 points :1 (b) Fit a ridge logistic regression model to the training data and use 10fold cross - validation using the AUC as a measure of error to tune A. For the optimal A answer the same questions (i., ii., ...) as asked for LASSO (8 points }. (c) Fit an Elastic net logistic regression model to the training data and use 10fold cross validation using the AUC as a measure of error to tune A. For the optimal A answer the same questions (L, ii., ..) as asked for LASSO (8 points }

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Real Analysis Foundations And Functions Of One Variable

Real Analysis Foundations And Functions Of One Variable

Authors: Miklos Laczkovich, Vera T Sós

1st Edition

1493927663, 9781493927661

More Books

Students also viewed these Mathematics questions

Question

★★★★★

Enter the following cash payments transactions in a general journal: Sept. 5 Issued Check No. 318 to Clinton Corp. for merchandise purchased August 28, $6,000, terms 2/10, n/30. Payment is made...

Answered: 1 week ago

Question

★★★★★

Why might the term average cost be misleading?

Answered: 1 week ago

Question

★★★★★

describe Aronsson and Gustafssons (2005) model of sickness absence and sickness presence, and compare this model with other models;

Answered: 1 week ago

Question

★★★★★

Pure Spring Company produces premium bottled water. In the second department, the Bottling Department, conversion costs are incurred evenly throughout the bottling process, but packaging materials...

Answered: 1 week ago

Question

★★★★★

please solve [The following information applies to the questions displayed below] Trey Monson starts a merchandising business on December 1 and enters into the following three inventory purchases....

Answered: 1 week ago

Question

★★★★★

Please answer using excel: PROBLEM 1: (20 Marks) Use the dataset provided in the Excel file "Job Satisfaction": a) Build a regression model for overall job satisfaction as the dependent variable and...

Answered: 1 week ago

Question

★★★★★

YOUR TASK As a Marketing Consultant, advice John Samson by preparing a strategic marketing plan to the Board to position HHRE as the market leader and a viable venture in the real estate business....

Answered: 1 week ago

Question

★★★★★

Question No. 4. A 5-star hotel has 200 rooms that has historically charged one set of prices for its room, which was Rs. 5000 per night. The variable cost for one room includes cleaning,...

Answered: 1 week ago

Question

★★★★★

Linux Systems Administration In this module's class activities, you acted as a system administrator in order to troubleshoot a malfunctioning Linux server. The senior administrator was quite pleased...

Answered: 1 week ago

Question

★★★★★

Draw the kinematic diagram of the following linkages and find their mobility using Gruebler's equation. 1 B 2 3 -O- D 6 C 4 Point C is not a joint. It is a significant point on link 5.

Answered: 1 week ago

Question

★★★★★

The Canadian Tire trying to expand into the USA Company: The Canadian Tire What are the sustainable strategic goals of the company? What aspects about the country could be good for the company on the...

Answered: 1 week ago

Question

★★★★★

What is the purpose of the staffing practice called Two-in-aBox?

Answered: 1 week ago

Question

★★★★★

What would you do?

Answered: 1 week ago

Question

★★★★★

Define each of the following workplace flexibility practices: a. flextime b. compressed workweek c. job sharing d. telecommuting

Answered: 1 week ago

Previous Question Next Question