\begin{tabular}{|l|r|r|} \hline \multicolumn{1}{|c|}{ Company } & Profits (S millions) & Market Capitalization (\$ millions) \\ \hline Alliant Techsystems & 313.2 & 1891.9 \\ \hline Amazon.com & 631 & 81458.6 \\ \hline AmerisourceBergen & 706.6 & 10087.6 \\ \hline Avis Budget Group & -29 & 1175.8 \\ \hline Boeing & 4,018.00 & 55188.8 \\ \hline Cardinal Health & 959 & 14115.2 \\ \hline Cisco Systems & 6,490.00 & 97376.2 \\ \hline Coca-Cola & 8,572.00 & 157130.5 \\ \hline ConocoPhillips & 12,436.00 & 95251.9 \\ \hline Costco Wholesale & 1,462.00 & 36461.2 \\ \hline CVS Caremark & 3,461.00 & 53575.7 \\ \hline Delta Air Lines & 854 & 7082.1 \\ \hline Fidelity National Financial & 369.5 & 3461.4 \\ \hline FMC Technologies & 399.8 & 12520.3 \\ \hline Foot Locker & 278 & 3547.6 \\ \hline General Motors & 9,190.00 & 32382.4 \\ \hline Harley-Davidson & 599.1 & 8925.3 \\ \hline HCA Holdings & 2,465.00 & 9550.2 \\ \hline Kraft Foods & 3,527.00 & 65917.4 \\ \hline Kroger & 602 & 13819.5 \\ \hline Lockheed Martin & 2,655.00 & 26651.1 \\ \hline Medco Health Solutions & 1,455.70 & 21865.9 \\ \hline Owens Corning & 276 & 3417.8 \\ \hline Pitney Bowes & 617.5 & 3681.2 \\ \hline Procter \& Gamble & 11,797.00 & 182109.9 \\ \hline Ralph Lauren & 567.6 & 12522.8 \\ \hline Rockwell Automation & 697.8 & 10514.8 \\ \hline Rockwell Collins & 109 & 8560.5 \\ \hline United Stationers & 5,979.00 & 1381.6 \\ \hline United Technologies & 5,142.00 & 66606.5 \\ \hline UnitedHealth Group & 53469.4 \\ \hline \end{tabular} 4. Compare and contrast hierarchical clustering versus k-means clustering ( 20 points). 5. Leggere, an Internet book retailer, is interested in better understanding the purchase decisions of its customers. For a set of 2,000 customer transactions, it has categorized the individual book purchases comprising those transactions into one or more of the following categories: Novels, Willa Bean series, Cooking Books, Bob Villa Do-ItYourself, Youth Fantasy, Art Books, Biography, Cooking Books by Mossimo Bottura, Harry Potter series, Florence Art Books, and Titian Art Books. Leggere has conducted association rules analysis on this data set and would like to analyze the output. The table below (file Leggere. xlsx ) shows the top 10 rules with respect to lift ratio. ( 20 points). a. For the rule "If a customer buys a Youth Fantasy book, then they buy Novels and Cooking book.", calculate the confidence and lift ratio. (10 points) b. Interpret both confidence and lift ratio numbers calculated in (a). (5 points) c. Among the rules shown in the table above, which one has the highest lift ratio? ( 5 points). 1. The file MutualFunds.xlsx contains a data set with information for 45 mutual funds that are part of the Morningstar Funds 500. The data set includes the following five variables: ( 20 points) - Fund Type: The type of fund, labeled DE (Domestic Equity), IE (International Equity), and FI (Fixed Income) - Net Asset Value (\$): The closing price per share - Five-Year Average Return (\%): The average annual return for the fund over the past five years - Expense Ratio (\%): The percentage of assets deducted each fiscal year for fund expenses - Morningstar Rank: The risk adjusted star rating for each fund; Morningstar ranks go from a low of 1 Star to a high of 5 Stars. a. Prepare a PivotTable that gives the frequency count of the data by Fund Type (rows) and the five-year average annual return (columns). Use classes of 0-9.99, 1019.99,2029.99,3039.99,4049.99, and 5059.99 for the Five-Year Average Return (\%). (10 points). b. What conclusions can you draw about the fund type and the average return over the past five years? (10 points). 2. The file Fortune500xlsx contains data for profits and market capitalizations from a recent sample of firms in the Fortune 500. Prepare a scatter diagram to show the relationship between the variables Market Capitalization and Profit in which Market Capitalization is on the vertical axis and Profit is on the horizontal axis. Comment on any relationship between the variables. (15 points). 3. What is text data? Is text usually classified under structured data? Explain in detail the different steps involved in the pre-processing of text data for text analytics. (20 points). \begin{tabular}{|l|l|l|l|l|} \hline Anteceden Consequet & \begin{tabular}{l} Support \\ for A \end{tabular} & \begin{tabular}{l} Support \\ for C \end{tabular} & \begin{tabular}{l} Support \\ for A \& C \end{tabular} \\ \hline BotturaCo & Cooking & 124 & 512 & 101 \\ \hline \begin{tabular}{l} Cooking, \\ BobVilla \end{tabular} & Art & 227 & 327 & 118 \\ \hline \begin{tabular}{l} Cooking, \\ Art \end{tabular} & Biography & 170 & 385 & 101 \\ \hline \begin{tabular}{l} Cooking, \\ Biography \end{tabular} & Art & 207 & 334 & 105 \\ \hline \begin{tabular}{l} Youth \\ Fantasy \end{tabular} & \begin{tabular}{l} Novels, \\ Cooking \end{tabular} & 227 & 512 & 170 \\ \hline \begin{tabular}{l} Cooking, \\ Art \end{tabular} & BobVilla & 190 & 385 & 105 \\ \hline \begin{tabular}{l} Cooking, \\ BobVilla \end{tabular} & Biography & 144 & 512 & 105 \\ \hline \begin{tabular}{l} Biography \\ Novels, \\ Cooking \end{tabular} & \begin{tabular}{l} Biography \\ Cooking \end{tabular} & 227 & 385 & 124 \\ \hline Art & 204 & 385 & 110 \\ \hline \end{tabular}