Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In this project, you will use the diamonds dataset and answer the questions. This classic dataset contains the prices and other attributes of almost
In this project, you will use the diamonds dataset and answer the questions. This classic dataset contains the prices and other attributes of almost 54,000 diamonds. It's a great dataset for beginners learning to work with data analysis and visualization. Content price--> price in US dollars (range is between $326 and $18,823) carat--> weight of the diamond (range is between 0.2 and 5.01) cut--> quality of the cut (Fair, Good, Very Good, Premium, Ideal) color--> diamond colour, from J (worst) to D (best) clarity--> a measurement of how clear the diamond is (11 (worst), SI2, SI1, VS2, VS1, VVS2, VVS1, IF (best)) x--> length in mm (range is between 0 and 10.74) y--> width in mm (range is between 0 and 58.9) Z--> depth in mm (range is between 0 and 31.8) depth--> total depth percentage = z / mean(x, y) = 2*z / (x + y) (range is between 43 and 79) table--> width of top of diamond relative to widest point (range is between 43-95) Table Width #include all the modules here Depth You can see and use pandas dataframe applications on this set from this website: https://www.w3resource.com/python-exercises/pandas/practice- set1/index.php #load in the dataset into a pandas dataframe and show the size of the file and show the first 5 rows of the data # Take only the columns with numeric data and put into a dataframe # Find and show the zero values in x, y, z and remove them from your df (low outliers are x 0.1 ) (Be careful depth is in % you need to # convert it to numeric values by dividing 100) # Remove these entries from the dataframe Question 1. How many samples were there in the original data and how many samples left after you clean the outliers? (10 Points) Answer: Correlation and Heatmaps # Find and show the correlation matrix # Use seaborn heatmap to see the correlations Question 2. Which feature has the highest correlation with the price, please explain your graph? (15 Points) Answer: PCA Analysis # Create a PCA object and calculate the principal components and show them. # Also transform the data using the PCA Question 3. How many principal components will be calculated if you don't specify the components number for this example? Why? Explain it. (15 Points) Answer: # Show the variance and the variance ratios in % # Plot the variace ratios as a bar graph Question 4. As a good data visualizer expert, how many principal components should be used to transform this data without loosing much information? Explain your calculations. (15 Points) Answer: # Create a new dataframe using the new data set pca_data # Draw the figure according to your answer from question 4. (2D or 3D or 4D graph, be careful???) # Show the most important PC and show the column which has the highest affect on the variance Question 5. Which feature/features have the highest effect on the variance or in other words, which feature is responsible from the differences between diamonds the most? Explain it. (15 Points) Answer: Question 6. Find and show the realtions between the color and the price. Explain your approach and discuss your calculations and the graphs below. (30 Points) Answer:
Step by Step Solution
★★★★★
3.30 Rating (162 Votes )
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started