Question
Customer Rating of Breakfast Cereals. The dataset Cereals.jmp Download Cereals.jmpincludes nutritional information, store display, and consumer ratings for 77 breakfast cereals. Data preprocessing. Note that
Customer Rating of Breakfast Cereals. The dataset Cereals.jmp Download Cereals.jmpincludes nutritional information, store display, and consumer ratings for 77 breakfast cereals.
Data preprocessing. Note that some cereals are missing values. These will be automatically omitted from the analysis. Use the Cols>Columns Viewer to identify which variables are missing values and how many values are missing.
- The variables [ Select ] ["calories, protein, and shelf", "carbo, sugars, and potass", "weight, cups, and rating"] have [ Select ] ["4", "12", "2"] missing variables in total.
The Hierarchical platform dialog provides an option to standardize (Standardize Data). Should this be selected? Why?
- The scales used for measurement [ Select ] ["are vastly", "are not significantly"] different, so the distance measure [ Select ] ["would not", "would be"] dominated by variables with larger values. Hence,Standardize Data option in Hierarchical platform [ Select ] ["should not be", "should be"] selected.
Apply hierarchical clustering to the data using single linkage and complete linkage (use only continuous variables in Y, Columns and cast the variable name to Label).Look at the dendrograms and the parallel plots. Comment on the structure of the clusters and on their stability.
With [ Select ] ["complete linkage", "single linkage"] , small changes in the distance cause large changes in the number of clusters. For example, the distance from 55 to 30 clusters is very narrow - clusters change very quickly over a short distance. So, [ Select ] ["complete linkage", "single linkage"] is more unstable. The change in clusters for [ Select ] ["complete linkage", "single linkage"] is more gradual.
- Hence [ Select ] ["single linkage", "complete linkage"] method leads to the most insightful or meaningful clusters.
- InDistance Graph there is a sharp upward bend at cluster number= [ Select ] ["2", "3", "5"] . This gives an idea about the optimal number of clusters that will be used in clustering.
The public elementary schools would like to choose a set of cereals to include in their daily cafeterias. Every day a different cereal is offered, but all cereals should support a healthy diet. For this goal you are requested to find a cluster of ''healthy cereals.''
Based on the variables at hand, how would you characterize ''healthy cereals''?
[ Select ] ["Low", "High"] calories, [ Select ] ["Low", "High"] protein, [ Select ] ["High", "Low"] fat, [ Select ] ["High", "Low"] fiber, [ Select ] ["Low", "High"] carbo, [ Select ] ["Low", "High"] sugar, [ Select ] ["High", "Low"] potass, [ Select ] ["Low", "High"] vitamins.
Use the red triangle options Cluster Summary, Cluster Means, and Parallel Coord Plots to check cluster means across the variables. Which cluster of cereals is the most ''healthy''?
[ Select ] ["Cluster 2", "Cluster 1", "Cluster 4"] is the healthiest, with high protein, fiber, and potass and low calories, fat, and carbs. But, this cluster contains the high bran and high fiber cereals that students might generally don't like. An alternative might be [ Select ] ["cluster 4", "cluster 5", "cluster 2"] , which is moderately high in the "good" characteristics (protein, vitamin, potassium) and students would be more likely to eat.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started