Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

Classification question ( Segmentation via entropy information gain) Use the example on pg 48 of our Data Science for Business text, Fig. 3.2 A set

image text in transcribed Classification question ( Segmentation via entropy information gain)

Use the example on pg 48 of our Data Science for Business text, Fig. 3.2 A set of people to be classified.

Use Scala to do your calculations: ( see notes on doing this for the first two segmentations)

a. Calculate the overall entropy of this set

b. Calculate the IG information gain by splitting on head shape

c. Calculate the IG information gain by splitting on body shape

d. Calculate the IG information gain by splitting on color.

d. Of these three, which is the best attribute to do segmentation with, why ?

Yes Yes Yes No Yes No No Yes No Yes No Figure 3-2. A set of people to be classified. The label over each head represents the value f the target variable (write-off or not). Colors and shapes represent different predictor attributes the future churn rate of the population? Being a professional? Age? Place of residence? Income? Number of complaints to customer service? Amount of overage charges? now will look carefully into one useful way to select informative variables, and then ater will show how this technique can be used repeatedly to build a supervised seg mentation. While very useful and illustrative, please keep in mind that direct, multi variate supervised segmentation is just one application of this fundamental idea of selecting informative variables. This notion should become one of your conceptual tools hen thinking about data science problems more generally. For example, as we go for ward we will delve into other modeling approaches, ones that do not incorporate ariable selection directly. When the world presents you with very large sets of attributes, may be (extremely) useful to harken back to this early idea and to select a subset of informative attributes. Doing so can substantially reduce the size ofan unwieldy dataset, and as we will see, often will improve the accuracy of the resultant model. Selecting Informative Attributes Given a large set of examples, how do we select an attribute to partition them in an informative way? Let's consider a binar (two class) classification problem, and think about what we would like to get out of it. To be concrete, Figure 3-2 shows a simple segmentation problem: of heads: square and circular; and two types of bodies: rectangular and oval; and two of the people have gray bodies while the rest are white twelve people represented as stick figures. There are two types will use to describe the people. Above each person is the ese are the attributes we nary target label, Yes or No, indicating (for example) whether the person beco oan write-off. We could describe the data on these people as bi

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Databases Work The Pragmatic Wisdom Of Michael Stonebraker

Making Databases Work The Pragmatic Wisdom Of Michael Stonebraker

Authors: Michael L. Brodie

1st Edition

1947487167, 978-1947487161

Students also viewed these Databases questions

Question

★★★★★

One might view a deferred tax liability as an interest-free loan from the government. Do you agree? Why or why not?

Answered: 1 week ago

Question

★★★★★

Which cell relates to the scattergram between internal beliefs and external beliefs? (a) d1 (b) d2 (c) d3 (d) d4

Answered: 1 week ago

Question

★★★★★

3. For an interesting look at conflict and debate, you need not search further than the U.S. Congress. Debates on the floor of the Senate and House of Representatives are broadcast on C-SPAN and...

Answered: 1 week ago

Question

★★★★★

The Pharma Biotech Corporation spent several years working on developing a DHA product that can be used to provide a fatty-acid supplement to a variety of food products. DHA stands for...

Answered: 1 week ago

Question

★★★★★

Classification question ( Segmentation via entropy information gain) Use the example on pg 48 of our Data Science for Business text, Fig. 3.2 A set of people to be classified. Use Scala to do your...

Answered: 1 week ago

Question

★★★★★

Ozark Company's high and low level of activity last year was 62,500 units of product produced in Jurve and 25.000 units produced in October. Machine maintenance costs were $150,000 in June and...

Answered: 1 week ago

Question

★★★★★

After being elected, the president\'s first opportunity to set their legislative priorities is: a . ) the bully pulpit b . ) the State of the Union address c . ) the signing statement d . ) the...

Answered: 1 week ago

Question

★★★★★

Barista Coffee Shoppe and Store has two departments. The Store sells ground coffees, coffee beans, grinders, coffee makers, cappuccino machines, mugs, aprons, flavored additives, and flavored...

Answered: 1 week ago

Question

★★★★★

A raw image file is compressed and saved onto a local hard disk. The main information processes being used are: (A) collecting, organising and displaying. (B) analysing, processing and storing. (C)...

Answered: 1 week ago

Question

★★★★★

All else equal, income that is taxed at a higher rate _______ than income that is taxed at a lower rate. A. is unrelated to B. is as valuable than C. is less valuable than D. is more valuable than

Answered: 1 week ago

Question

★★★★★

At Volleyballs-R-Us, we buy volleyballs for $18 and sell them for $30. If we make a sale of one additional volleyball, and nothing else changes, how much does net income increase? $30 $18 $12 $6

Answered: 1 week ago

Question

★★★★★

3. Ratings, rankings, and other approaches used to assess employees job performance are always subject to discussion and debate. Managers need to understand the current applications associated with...

Answered: 1 week ago

Question

★★★★★

2. The development of engaging performance appraisal policies is an opportunity to build excellence in companies. Gamification and multisource ratings are approaches that can be used to make...

Answered: 1 week ago

Question

★★★★★

3. How have the evolving business environment and the emergence of free agent workers affected make-or-buy talent decisions?

Answered: 1 week ago

Previous Question Next Question