Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 09, 2024

To begin, load the Titanic dataset from Seaborn and focus on the relevant variables. The target variable for this analysis is 'survived', which indicates if

To begin, load the Titanic dataset from Seaborn and focus on the relevant variables. The target variable for this analysis is 'survived', which indicates if a passenger survived

(1)

or died

(0) .

The other variables will serve as input features.

In a Markdown cell, identify the categorical and numerical features by examining the possible values each feature can assume. Remember, there are four categorical and four numerical features.

Next, partition the data into training and testing sets, ensuring to stratify based on the target variable to maintain the distribution. Extract

20 %

of the data for testing. The resulting DataFrames should be named df

_

train and df

_

test. Print the dimensions of these datasets.

Using the training data, calculate the probability of each target label, P

(

) .

For the categorical features in the training data, calculate and show P

(

),

the probability of each feature given the target label.

Identify and display the means and standard deviations of the continuous features. The probability of observing any value for these features is assumed to follow the Normal Distribution:

(

) =

(

,

)

where

j and

j are calculated from each feature j and target label yk

.

Construct a prediction function that calculates:

(

(

)) =

argmaxk in

{0, 1} [

(

) \

times

jfP

(

(

)

)]

where yk

= 0

means Died and yk

= 1

means Survived. This function should calculate the above expression for k

= 0

and k

= 1,

and return both predictions for each instance considered.

def calculate

_

class

_

predictions

(

data

,

return

_

normed

_

probs

=

False

)

predicted

_

probs

_

per

_

instance

=

.

empty

(

shape

= (0, 2))

## Construct function here

predicted

_

probs

_

per

_

instance

=

.

vstack

([

predicted

_

probs

_

per

_

instance, np

.

array

(

prob

_

died, prob

_

survived

)])

return

(

predicted

_

probs

_

per

_

instance

)

Using your function, calculate the predictions from the training data and then compute the confusion matrix and accuracy score.

Repeat the prediction using the test data and calculate the confusion matrix and accuracy score for this set as well.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

MySQL/PHP Database Applications

Authors: Jay Greenspan, Brad Bulger

1st Edition

978-0764535376

More Books

Students also viewed these Databases questions

Question

★★★★★

(a) How do you recognize a correlational design? (b) To select a correlation coefficient, what must you ask about the variables? (c) What is the parametric correlation coefficient? (d) What is the...

Answered: 1 week ago

Question

★★★★★

NBC stock just paid $ 1 . 2 8 a share in dividends. NBC is expected to grow at 5 % . Beta on NBC is 1 , 4 . S&P 5 0 0 is expected to yield 1 1 % . T - Bonds yield is 6 % . NBC sells for $ 1 6 . YTM...

Answered: 1 week ago

Question

★★★★★

1. Using your knowledge of film, identify three movie characters who display the defense mechanism of regression.

Answered: 1 week ago

Question

★★★★★

1. How many times has the company been sold? When and by whom to whom? 2. What mistakes did Beatrice Foods make after purchasing Krispy Kreme? Why wasnt Krispy Kreme a good fit for Beatrice? 3. What...

Answered: 1 week ago

Question

★★★★★

What's the output of the following code segment? num = 1 6 if ( num % 3 = = 0 ) : if ( num % 4 = = 0 ) : print ( " multiple of 1 2 " ) else: if ( num % 4 = = 0 ) : print ( " multiple of 4 " ) else:...

Answered: 1 week ago

Question

★★★★★

Perhaps the most frequently misunderstood concept in statistics is the difference between correlation and causation. Many think incorrectly that correlation implies causation. For example, while it...

Answered: 1 week ago

Question

★★★★★

If a group of people was placed in a crowded room, one person per two square feet of area, given a math exam and most did poorly, this would not demonstrate that crowding causes poor math performance...

Answered: 1 week ago

Question

★★★★★

What does stickiest refer to in regard to social media

Answered: 1 week ago

Question

★★★★★

A scientific study on lift strength gives the following data table Lift strength Tons 46 47 51 55 56 Time to move load seconds 159 166 123 128 117 Using technology it was determined that the total...

Answered: 1 week ago

Question

★★★★★

Which of the following jobs has the greatest risk of death? a. Firefighter b. Compliance officer c. Event coordinator d. Taxi driver e. Airline pilot

Answered: 1 week ago

Question

★★★★★

Katsu perceives that Gloria only pays attention to the wealthiest customers who she expects to leave large tips. Katsu reminds Gloria that they are supposed to rotate based on when customers enter...

Answered: 1 week ago

Question

★★★★★

1. Think back to a memorable speech youve witnessed, either in person or through the media. What kind of speech was it? Was the speaker trying to inform, persuade, or celebrate? Was he or she...

Answered: 1 week ago

Question

★★★★★

3. Take a look at your schools policy on plagiarism. Does your school clearly define what acts constitute plagiarism? How harsh are the punishments? Who is responsible for reporting plagiarism? How...

Answered: 1 week ago

Question

★★★★★

2. Are my sources up to date?

Answered: 1 week ago

Previous Question Next Question