Question
Consider the following dataset that contains 80 cereal products with their dietary characteristics. Variable Name Description Name Name of cereal mfr Manufacturer of cereal: A
Consider the following dataset that contains 80 cereal products with their dietary characteristics. Variable Name Description
Name Name of cereal
mfr Manufacturer of cereal:
A = American Home Food Products;
G = General Mills
K = Kelloggs
N = Nabisco
P = Post
Q = Quaker Oats
R = Ralston Purina
type Type of the cereal (cold/hot)
calories Calories per serving
protein Grams of protein
fat Grams of fat
sodium Milligrams of sodium
fiber Grams of dietary fiber
carbo Grams of complex carbohydrates
sugars Grams of sugars
potass Milligrams of potassium
vitamins Vitamins and minerals - 0, 25, or 100, indicating the typical percentage of FDA recommended
shelf Display shelf (1, 2, or 3, counting from the floor)
weight Weight in ounces of one serving
cups Number of cups in one serving
rating A rating of the cereals from Consumer Reports
1) Which variable(s) should be "dummify" (transformed into multiple binary variables)? Why?
2) Which variable(s) should be excluded from the analysis? Why?
3) If we were to use protein and fat as predictors with a model that is sensitive to the scale (such as K-Nearest Neighbor), should we standardize them, given that both of them are already on the same scale (grams). Why or why not?
4) If we were to use this dataset to predict the rating of each cereal, what additional variables should be collected to improve the predictive performance?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started