Question

1 Approved Answer

Posted on Jun 15, 2024

Which of the statements about the components of a classification tree are true? Tick all that apply. O If we build a tree based on

Which of the statements about the components of a classification tree are true? Tick all that apply. O If we build a tree based on a sample of 500 observations, and run these same observations through the fitted tree, it is possible that one of the nodes ends up "empty" (i.e. none of the observations ends up in a given nodes). A classification tree can only have one root node, no matter how many candidate predictors there are. If a categorical predictor used in a classification tree has 5 levels, it is NOT possible that a parent node could be split into 5 child nodes in a single step. O A particular node in a classification tree can only be a parent node OR a child node (but not both).Suppose you have built a classification tree to predict which incoming emails are spam (i.e. junk emails), and you are particularly concerned about NOT having spam emails enter your inbox, even if that means some important (non-spam) emails might accidentally skip your inbox and go directly to your junk folder - you assume that if an email is important enough, the sender will resend it after a few days. Given your preferences, which of the following statements describes the classification tree you would prefer to use to decide which of your emails should go to your inbox (if predicted not to be spam) vs go straight to your junk folder (if predicted to be spam)? In this context, consider "predict spam" to be a positive outcome and "predict not spam" to be a negative outcome. O A classification tree with high accuracy (even if the sensitivity and specificity were both lower) O A classification tree with high sensitivity O A classification tree with low type 2 error O A classification tree with low type 1 error O A classification tree with high specificitySuppose you have built a classification tree to predict which incoming emails are spam (i.e. junk emails), and you want to assess the predictive performance of this tree by making predictions for 10 new emails, which were not used to fit the tree. Here are results for these 10 new observations: Predicted spam Actually spam Predicted not spam Actually spam Predicted not spam Actually not spam Predicted not spam Actually not spam Predicted spam Actually spam Predicted not spam Actually spam Predicted spam Actually not spam Predicted not spam Actually not spam Predicted not spam Actually not spam Predicted not spam Actually not spam Based on the table above, fill in the 2x2 contingency table below: Actually spam Actually not spam Predicted spam Predicted not spamThe package contains data about tropical storms from 1995-2005. There are four types of storms: Extratrcrpical, Hu rrlcane, Tropical Depression, and Tropical Storm. For each storm. several variables were recorded including the pressure {in millibars), the maximum wind speed (In knots), and the type of storm {as already listed]. Based on the tree above, make predictions for the 3 new observations below: A. A storm with winds of 50 knots and 950 millibars? B. A storm with winds of 30.5 knots and 900 millibars? [SHEEN V C. A storm with winds of 70 knots and 1.000 milears? Suppose we build a classication tree to predict whether a baby will be born premature (= 36 weeks)' We then make predictions for a 35 mole of new observations, which were not used to fit the tree and for which we also know the actual values Actual premature birth Actual full term birth Predicted premature birth 4?!) 25 Predicted full term birth 15 453 Based on the confusion matrix above, which of the following is the sensitivity of our classication tree? (As we are motivated to make this prediction to identify ore-mature births, consider premature your 'positive' outcome} 0 0.9477 0 0.9495 0 0.9535 G 0.9691 The figure below shows the geometric interpretation of a classification tree built on data from 1000 pregnancies to predict whether a baby will be born premature (= 36 weeks), using each parent's age as predictors. The points correspond to a sample of 25 additional pregnancies (which weren't used to fit the model) and the lines correspond to the splits from the original tree. Based only on the figure below, how many terminal nodes are there in corresponding classification tree? 70- 60 - 50 premie Father's age . full term 40 Premature . premle 30 - 20 Premature Full term 20 30 40 50 Mother's age O 2 OO 0 3 O There is not enough information to answer this question. O 1The figure below shows the geometric interpretation of a classification tree built on data from 1000 pregnancies to predict whether a baby will be born premature (= 36 weeks), using each parent's age as predictors. The points correspond to a sample of 25 additional pregnancies (which weren't used to fit the model) and the lines correspond to the splits from the original tree. Based only on the figure below, which of the statements below is most accurate? 70 - 60 - 50 premie Father's age . full term Premature premie . 1 20 Premature Full term 20 30 Mother's age The root node in the corresponding classification tree would be split based on the mother's age being less than 35.5 years old or greater than 35.5 years old. The root node in the corresponding classification tree would be split based on the father's age being less than 35.5 years old or greater than 35.5 years old. O The root node in the corresponding classification tree would be split based on the father's age being less than 35.5 years old or greater than 30.5 years old. O The root node in the corresponding classification tree would be split based on the mother's age being less than 35.5 years old or greater than 30.5 years old. O The root node in the corresponding classification tree would be split based on the mother's age being less than 35.5 years old or greater than 30.5 years old AND the father's age being less than 35.5 years old. O The root node in the corresponding classification tree would be split based on the mother's age being less than 35.5 years old or greater than 35.5 years old AND the father's age being less than 30.5 years old.Based on the passage above, which of the following statements is most accurate? O As long as we measure prediction accuracy, sensitivity, and specificity for a classification model based on new observations which weren't used for testing, we can always be very confident that these are good estimates of predictive performance. O This biggest current issue with the tests is that too many people who don't have the virus are being told that they do. O Even if we measure the prediction accuracy, sensitivity, and specificity of a classification model based on new data which was not used to build the model, it is difficult to get good estimates of these quantities if we aren't confident in our ability to determine the "true" class to which individuals belong (i.e. actually having / not having COVID-19) O Ideally, the sensitivity and specificity are highly dependant on the diagnostic method being used.What does it mean when we say that the sensitivity of a diagnosis test for COVID-19 could be as low as 70%? O Among individuals who receive positive COVID-19 tests, as few as 70% are actually infected (at the time of testing). The remaining nearly 30% of individuals who receive positive test results were not actually infected at the time they were tested. O Among individuals who not actually infected with COVID-19 and get tested, as many as 30% receive positive results. These individuals will think that they have COVID 19 and thus isolate at home, even though in reality they were not infected at the time they were tested. O Among individuals who are actually infected with COVID-19 and get tested, as many as 30% receive negative results. These individuals will think that they are free of COVID 19 and thus may unintentionally infect individuals they are in contact with. O Among individuals who receive negative COVID-19 tests, as few as 70% are actually free of infection at the time of testing. The remaining nearly 30% of individuals who receive negative test results were actually infected at the time they were tested.Suppose we build a classification tree based on a sample of 1000 observations, and continue splitting the tree until there are 1000 terminal nodes. In other words, we continue making splits until there is only one observation from the training sample in each of the terminal nodes. Which ONE of the statements below is valid? O A classification tree constructed in this way would have perfect accuracy, sensitivity, and specificity if we estimate these quantities based on new observations which were not used for training. O While the accuracy of a classification tree constructed in this way would most likely be quite good, we can't comment on what the sensitivity and specificity would be. O A classification tree constructed in this way is very useful for making predictions because it has so many terminal nodes. O A classification tree constructed in this way is very likely overfitting the training data