Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

2. Practicum Problems It is suggested that a Jupyter/IPython notebook be used for the programmatic components. 2.1 Problem 1 Load the iris sample dataset from

2. Practicum Problems

It is suggested that a Jupyter/IPython notebook be used for the programmatic components.

2.1 Problem 1

Load the iris sample dataset from sklearn (load_iris()) into Python using a Pandas dataframe. Induce a set of binary Decision Trees with a minimum of 2 instances in the leaves, no splits of subsets below 5, and an maximal tree depth from 1 to 5 (you can leave the majority parameter to 95%). Which depth values result in the highest Recall? Why? Which value resulted in the lowest Precision? Why? Which value results in the best F1 score? Explain the difference between the micro/macro/weighted methods of score calculation.

2.2 Problem 2

Load the Breast Cancer Wisconsin (Diagnostic) sample dataset from the UCI Machine Learning Repository (The discrete version at: breast-cancerwisconsin.data) into Python using a Pandas dataframe. Induce a binary Decision Tree with a minimum of 2 instances in the leaves, no splits of subsets below 5, and a maximal tree depth of 2 (use the default Gini criterion). Calculate the Entropy, Gini, and Misclassification Error of the first split - what is the Information Gain? What is the feature selected for the first split, and what value determines the decision boundary?

2.3 Problem 3

Load the Breast Cancer Wisconsin (Diagnostic) sample dataset from the UCI Machine Learning Repository (The continuous version at: wdbc.data) into Python using a Pandas dataframe. Induce the same binary Decision Tree as above (now using the continuous data) but perform a PCA dimensionality reduction beforehand. Using only the first principal component of the data for a model fit, what is the F1, Precision, and Recall of the PCA-based single factor model compared to the original (continuous) data? Repeat using the first and second principal components. Using the Confusion Matrix, what are the values for FP and TP as well as FPR/TPR? Is using continuous data in this case beneficial within the model? How?

2.4 Problem 4

Simulate a binary classification dataset with a single feature using a mixture of normal distributions with NumPy (Hint: Generate two data frames with the random number and a class label, and combine them together). The normal distribution parameters (np.random.normal) should be (5,2) and (-5,2) for the pair of samples. Induce a binary Decision Tree of maximum depth 2, and obtain the threshold value for the feature in the first split. How does this value compare to the empirical distribution of the feature?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Mastering Big Data Interview 751 Comprehensive Questions And Expert Answers

Mastering Big Data Interview 751 Comprehensive Questions And Expert Answers

Authors: Mr Bhanu Pratap Mahato

1st Edition

B0CLNT3NVD, 979-8865047216

More Books

Students also viewed these Databases questions

Question

★★★★★

Should the management of a company consider fixed costs in the decision making process, or should they ignore fixed costs and base their decision on what makes the most business sense? Recently, the...

Answered: 1 week ago

Question

★★★★★

=+11 (Examination level) The following statements are extracts from the detailed minutes taken at a Board meeting of Advance plc. This company is discussing the possibility of a new flotation on the...

Answered: 1 week ago

Question

★★★★★

Journalist Dorothy Dix once remarked, Nobody wants to kiss when they are hungry. Which motivation theory best supports her statement?

Answered: 1 week ago

Question

★★★★★

Show that the shortest distance between two points in (three-dimensional) space is a straight line.

Answered: 1 week ago

Question

★★★★★

The private key is always fine to distribute to a public location, it's the public key that you have to keep in a secure location. Question 6 options: True False

Answered: 1 week ago

Question

★★★★★

Question 2 (60 marks) A. Based on the calculation in part 1A, notably the profit margin increased from the October 2, 2022 (FY2022), to October 1, 2023 (FY2023). year ended i) What does the increase...

Answered: 1 week ago

Question

★★★★★

VWX Inc., has sales of $500,000, net income of $80,000, dividend payout of 50%, total assets of $700,000 and target debt-equity ratio of 1.5. If the company grows at its sustainable growth rate in...

Answered: 1 week ago

Question

★★★★★

Because a mother does not want to "spoil" her new born baby. She feeds and takes care of her baby's physical needs when she thinks the baby needs care, not when the baby cries to indicate their...

Answered: 1 week ago

Question

★★★★★

Application of Core Conflictual Relationship Theme (CCRT) Therapy to counselling immigrants and refugees Evidence understanding of applying a contemporary psychodynamic theory which can be modified...

Answered: 1 week ago

Question

★★★★★

Which of the following features of "There Will Come Soft Rains" does not reflect postmodernism? A . Its pastiche characteristics and blending of genres: elements of satire, speculative fiction,...

Answered: 1 week ago

Question

★★★★★

If a group of people was placed in a crowded room, one person per two square feet of area, given a math exam and most did poorly, this would not demonstrate that crowding causes poor math performance...

Answered: 1 week ago

Question

★★★★★

Compare the different types of employee separation actions.

Answered: 1 week ago

Question

★★★★★

Assess alternative dispute resolution methods.

Answered: 1 week ago

Question

★★★★★

Distinguish between intrinsic and extrinsic rewards.

Answered: 1 week ago

Previous Question Next Question