[Solved] Date Set Description: We'll be using the | SolutionInn

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

Date Set Description: We'll be using the mammographic masses public dataset from the UCI repository. This data contains 961 instances of masses detected in mammograms,

Date Set Description:

We'll be using the "mammographic masses" public dataset from the UCI repository. This data contains 961 instances of masses detected in mammograms, and contains the following attributes:

1. BI-RADS assessment: 1 to 5 (ordinal)

2. Age: patient's age in years (integer)

3. Shape: mass shape: round=1 oval=2 lobular=3 irregular=4 (nominal)

4. Margin: mass margin: circumscribed=1 microlobulated=2 obscured=3 ill-defined=4 spiculated=5 (nominal)

5. Density: mass density high=1 iso=2 low=3 fat-containing=4 (ordinal)

6. Severity: benign=0 or malignant=1 (binominal) (The dependent variable)

BI-RADS is an assessment of how confident the severity classification is; it is not a "predictive" attribute and so we will discard it. The age, shape, margin, and density attributes are the features that we will build our model with, and "severity" is the classification we will attempt to predict based on those attributes.

Although "shape" and "margin" are nominal data types, which sklearn typically doesn't deal with well, they are close enough to ordinal that we shouldn't just discard them. The "shape" for example is ordered increasingly from round to irregular.

Note: You may use pandas, numpy, sklearn or any essential python package you need.

[1.5 points] Read the given mammographic_masses.data.txt data set into masses_data

The output should be like this:

[1.5 points] Provide a plot to show the counts of both classes in severity dependent variable.

The output should be like this:

Question 2: [2 points]

For data preprocessing:

[1 points] list all rows that contain missing values in the data set:

The output should be like this:

[1 points] Impute the missing values in the data set using the median value for each column containing missing value.

[.5 points] Separate the columns of messes_data into X and y, where X contains the columns BI-RADS, age, shape, margin, and density and y contain severity only.

[.5 points] Use train_test_split command to prepare X_train, X_test, y_train, and y_test.

Use logistic regression algorithm to model the prepared training data.

Use logistic regression algorithm to predict the test data.

Use the sklearn accuracy_score to find the accuracy of your developed model on testing data.

Print the obtained accuracy from previous question.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database And Expert Systems Applications 31st International Conference Dexa 2020 Bratislava Slovakia September 14 17 2020 Proceedings Part 1 Lncs 12391

Database And Expert Systems Applications 31st International Conference Dexa 2020 Bratislava Slovakia September 14 17 2020 Proceedings Part 1 Lncs 12391

Authors: Sven Hartmann ,Josef Kung ,Gabriele Kotsis ,A Min Tjoa ,Ismail Khalil

1st Edition

ISBN: 303059002X, 978-3030590024

More Books

Students also viewed these Databases questions

Question

★★★★★

What is a group and how do groups form?

Answered: 1 week ago

Question

★★★★★

Murphy't he has 90,000 shares of stock outstanding with a par value of $1 per share. The market value is $10 per share. The elev stor shows $75,000 in the capital in excess of par account, $90,000 in...

Answered: 1 week ago

Question

★★★★★

Interpret and comment on the following statement: Technology is the end of our beginning. Use examples to defend your view.

Answered: 1 week ago

Question

★★★★★

EnviroFriend Structures, Inc., builds environmentally sensitive structures. The companys 2018 revenues totaled $2,780 million. At December 31, 2018, and 2017, the company had, respectively, $656...

Answered: 1 week ago

Question

★★★★★

Date Set Description: We'll be using the "mammographic masses" public dataset from the UCI repository. This data contains 961 instances of masses detected in mammograms, and contains the following...

Answered: 1 week ago

Question

★★★★★

An interesting paradox called the "Paradox of Thrift" arises when households become concern about their future and attempt to increase their saving. As a consequence of such action the overall...

Answered: 1 week ago

Question

★★★★★

A RAND study of criminal investigations characterized detectives as glorified clerks for the district attorney rather than super - sleuths that solve crime?

Answered: 1 week ago

Question

★★★★★

BNY Mellon used its additional profits to increase entry - level pay levels rather than offering incentive pay such as merit increases or bonuses. What advantages of incentive pay was the bank...

Answered: 1 week ago

Question

★★★★★

The combination of an objective/goal plus evaluation criterion (measures/metrics) creates the basis for a control. Identify 3 examples of controls that you would recommend for a re-negotiated...

Answered: 1 week ago

Question

★★★★★

In the past year, a mid - sized company has decided to strengthen its global operations. As an initiative to achieve this goal, there has been an increase in hiring in offices across countries....

Answered: 1 week ago

Question

★★★★★

The design of a single-stage axial-flow turbine is to be based on constant axial velocity with axial discharge from the rotor blades directly to the atmosphere. The following design values have been...

Answered: 1 week ago

Question

★★★★★

Compare the different types of employee separation actions.

Answered: 1 week ago

Question

★★★★★

Assess alternative dispute resolution methods.

Answered: 1 week ago

Question

★★★★★

Distinguish between intrinsic and extrinsic rewards.

Answered: 1 week ago

Previous Question Next Question