Use the data for the breakfast cereals example in Section to explore and summarize the data as
Question:
Use the data for the breakfast cereals example in Section to explore and summarize the data as follows:
a. Which attributes are quantitative/numerical? Which are nominal?
b. Compute the mean, median, min, max, and standard deviation for each of the quantitative attributes. This can be done using RapidMiner as shown in Figure 4.1. The median for each attribute can be computed using the Aggregate operator.
c. Plot a histogram for each of the quantitative attributes. Based on the histograms and summary statistics, answer the following questions:
i. Which attributes have the largest variability?
ii. Which attributes seem skewed?
iii. Are there any values that seem extreme?
d. Plot a side-by-side boxplot comparing the calories in hot vs. cold cereals. What does this plot show us?
e. Plot a side-by-side boxplot of consumer rating as a function of the shelf height. If we were to predict consumer rating from shelf height, does it appear that we need to keep all three categories of shelf height?
f. Compute the correlation table along with a heatmap for the quantitative attributes (Correlation Matrix operator).
i. Which pair of attributes is most strongly correlated?
ii. How can we reduce the number of attributes based on these correlations?
iii. How would the correlations change if we normalized the data first?
g. Consider the first principal component (PC) of the analysis of the 13 numerical attributes in Figure 4. 11(bottom). Describe briefly what this PC represents.
Step by Step Answer:
Machine Learning For Business Analytics
ISBN: 9781119828792
1st Edition
Authors: Galit Shmueli, Peter C. Bruce, Amit V. Deokar, Nitin R. Patel