Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Step 1 : Use the code from Week 7 as a Starting Point In this assignment, we will not be doing all the analysis as

Step 1: Use the code from Week 7 as a Starting Point
In this assignment, we will not be doing all the analysis as before. But much of the code from week 6 can be used as a starting point for this assignment. For this assignment, do not be concerned with splitting data into training and test sets. In the real world, you would do that. But for this exercise, it would only be an unnecessary complication.
Step 2: PCA Analysis
Use only the input variables. Do not use either of the target variables.
Use only the continuous variables. Do not use any of the flag variables.
Select at least 4 of the continuous variables. It would be preferable if there were a theme to the variables selected.
Do a Principal Component Analysis (PCA) on the continuous variables.
Display the Scree Plot of the PCA analysis.
Using the Scree Plot, determine how many Principal Components you wish to use. Note, you must use at least two. You may decide to use more. Justify your decision. Note that there is no wrong answer. You will be graded on your reasoning, not your decision.
Print the weights of the Principal Components. Use the weights to tell a story on what the Principal Components represent.
Perform a scatter plot using the first two Principal Components. Do not color the dots. Leave them black.
Step 3: Cluster Analysis - Find the Number of Clusters
Use the principal components from Step 2 for this step.
Using the methods presented in the lectures, complete a KMeans cluster analysis for N=1 to at least N=10. Feel free to take the number higher.
Print a scree plot of the clusters and determine how many clusters would be optimum. Justify your decision.
Step 4: Cluster Analysis
Using the number of clusters from step 3, perform a cluster analysis using the principle components from Step 2.
Print the number of records in each cluster.
Print the cluster center points for each cluster
Convert the KMeans clusters into "flexclust" clusters
Print the barplot of the cluster. Describe the clusters from the barplot.
Score the training data using the flexclust clusters. In other words, determine which cluster they are in.
Perform a scatter plot using the first two Principal Components. Color the plot by the cluster membership.
Add a legend to the plot.
Determine if the clusters predict loan default.
Step 4: Describe the Clusters Using Decision Trees
Using the original data from Step 2, predict cluster membership using a Decision Tree
Display the Decision Tree
Using the Decision Tree plot, describe or tell a story of each cluster. Comment on whether the clusters make sense.
Step 6: Comment
Discuss how you might use these clusters in a corporate setting.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Calculate the missing values

Answered: 1 week ago

Question

Describe your ideal working day.

Answered: 1 week ago