Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Problem 2 : Random Forest Classification Assume that a dataset contains two columns that are to be used as features in a decision tree classifier.
Problem : Random Forest Classification
Assume that a dataset contains two columns that are to be used as features in a decision tree classifier. These
columns are named x and x Both features are continuous numerical features. The label is a categorical
variable with two possible values: and The two features are combined in order by a VectorAssembler
object to create a column named features.
Suppose that three decision tree classifiers are trained on bootstrap samples drawn from this dataset. Each tree
uses the features column as its input. The contents of the toDebugString attribute of each tree is shown
below.
Tree Model
If feature
If feature
Predict:
Else feature
Predict:
Else feature
If feature
Predict:
Else feature
Predict:
Tree Model
If feature
If feature
Predict:
Else feature
Predict:
Else feature
If feature
Predict:
Else feature
Predict:
Tree Model
If feature
If feature
Predict:
Else feature
Predict:
Else feature
If feature
Predict:
Else feature
Predict:
Consider a new observation for which x and x Use the rules above to determine which of the
two labels each tree model would assign to this observation. Then assume that a random forest is created from
these three trees. Determine the label that the random forest would assign to the new observation. Provide
your answers in the format shown below.
Tree Model Prediction: xxxx
Tree Model Prediction: xxxx
Tree Model Prediction: xxxx
Random Forest Prediction: xxx
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started