Question
Problem 1 : Linear Regression You are hired by a company Gem Stones co ltd, which is a cubic zirconia manufacturer. You are provided with
Problem 1: Linear Regression
You are hired by a company Gem Stones co ltd, which is a cubic zirconia manufacturer. You are provided with the dataset containing the prices and other attributes of almost 27,000 cubic zirconia (which is an inexpensive diamond alternative with many of the same qualities as a diamond). The company is earning different profits on different prize slots. You have to help the company in predicting the price for the stone on the bases of the details given in the dataset so it can distinguish between higher profitable stones and lower profitable stones so as to have better profit share. Also, provide them with the best 5 attributes that are most important.
Data Dictionary:
Variable Name | Description |
Carat | Carat weight of the cubic zirconia. |
Cut | Describe the cut quality of the cubic zirconia. Quality is increasing order Fair, Good, Very Good, Premium, Ideal. |
Color | Colour of the cubic zirconia.With D being the best and J the worst. |
Clarity | Clarity refers to the absence of the Inclusions and Blemishes. (In order from Best to Worst in terms of avg price) IF, VVS1, VVS2, VS1, VS2, Sl1, Sl2, l1 |
Depth | The Height of cubic zirconia, measured from the Culet to the table, divided by its average Girdle Diameter. |
Table | The Width of the cubic zirconia's Table expressed as a Percentage of its Average Diameter. |
Price | the Price of the cubic zirconia. |
X | Length of the cubic zirconia in mm. |
Y | Width of the cubic zirconia in mm. |
Z | Height of the cubic zirconia in mm. |
Please explain with python code for the below with detailed explanation so that i can understand practice and do it on my own without any further doubts.
1.1. Read the data and do exploratory data analysis. Describe the data briefly. (Check the null values, Data types, shape, EDA, duplicate values). Perform Univariate and Bivariate Analysis.
1.2 Impute null values if present, also check for the values which are equal to zero. Do they have any meaning or do we need to change them or drop them? Check for the possibility of combining the sub levels of a ordinal variables and take actions accordingly. Explain why you are combining these sub levels with appropriate reasoning.
1.3 Encode the data (having string values) for Modelling. Split the data into train and test (70:30). Apply Linear regression using scikit learn. Perform checks for significant variables using appropriate method from statsmodel. Create multiple models and check the performance of Predictions on Train and Test sets using Rsquare, RMSE & Adj Rsquare. Compare these models and select the best one with appropriate reasoning.
1.4 Inference: Basis on these predictions, what are the business insights and recommendations.
Please explain and summarise the various steps performed in this project. There should be proper business interpretation and actionable insights present.
Dataset for Problem 1: cubic_zirconia.csv
carat | cut | color | clarity | depth | table | x | y | z | price | |
1 | 0.3 | Ideal | E | SI1 | 62.1 | 58 | 4.27 | 4.29 | 2.66 | 499 |
2 | 0.33 | Premium | G | IF | 60.8 | 58 | 4.42 | 4.46 | 2.7 | 984 |
3 | 0.9 | Very Good | E | VVS2 | 62.2 | 60 | 6.04 | 6.12 | 3.78 | 6289 |
4 | 0.42 | Ideal | F | VS1 | 61.6 | 56 | 4.82 | 4.8 | 2.96 | 1082 |
5 | 0.31 | Ideal | F | VVS1 | 60.4 | 59 | 4.35 | 4.43 | 2.65 | 779 |
6 | 1.02 | Ideal | D | VS2 | 61.5 | 56 | 6.46 | 6.49 | 3.99 | 9502 |
7 | 1.01 | Good | H | SI1 | 63.7 | 60 | 6.35 | 6.3 | 4.03 | 4836 |
8 | 0.5 | Premium | E | SI1 | 61.5 | 62 | 5.09 | 5.06 | 3.12 | 1415 |
9 | 1.21 | Good | H | SI1 | 63.8 | 64 | 6.72 | 6.63 | 4.26 | 5407 |
10 | 0.35 | Ideal | F | VS2 | 60.5 | 57 | 4.52 | 4.6 | 2.76 | 706 |
11 | 0.32 | Ideal | E | VS2 | 61.6 | 56 | 4.4 | 4.43 | 2.72 | 637 |
12 | 1.1 | Premium | D | SI1 | 60.7 | 55 | 6.74 | 6.71 | 4.08 | 6468 |
13 | 0.5 | Good | E | VS1 | 61.1 | 58.2 | 5.08 | 5.12 | 3.11 | 1932 |
14 | 0.71 | Ideal | D | SI2 | 61.6 | 55 | 5.74 | 5.76 | 3.54 | 2767 |
15 | 1.5 | Fair | G | VS2 | 66.2 | 53 | 7.12 | 7.08 | 4.7 | 10644 |
16 | 0.31 | Ideal | G | VS2 | 61.6 | 55 | 4.37 | 4.39 | 2.7 | 544 |
17 | 0.34 | Ideal | G | SI1 | 61.2 | 57 | 4.56 | 4.53 | 2.78 | 650 |
18 | 1.01 | Ideal | D | VS2 | 59.8 | 56 | 6.52 | 6.49 | 3.89 | 7127 |
19 | 0.9 | Good | D | SI1 | 61.9 | 64 | 6 | 6.09 | 3.74 | 3567 |
20 | 0.54 | Premium | G | VS2 | 60 | 59 | 5.42 | 5.22 | 3.19 | 1637 |
21 | 1.04 | Premium | D | VVS2 | 61.1 | 60 | 6.54 | 6.51 | 3.99 | 10984 |
22 | 0.4 | Ideal | F | VS2 | 62.9 | 57 | 4.72 | 4.69 | 2.96 | 1080 |
23 | 1.52 | Ideal | D | SI2 | 62.7 | 56 | 7.35 | 7.28 | 4.59 | 8631 |
24 | 1.19 | Ideal | J | SI2 | 61.7 | 56 | 6.8 | 6.85 | 4.21 | 4508 |
25 | 0.66 | Ideal | H | SI1 | 62.4 | 58 | 5.53 | 5.56 | 3.46 | 1609 |
26 | 1.5 | Premium | H | SI2 | 61.4 | 62 | 7.4 | 7.25 | 4.49 | 7187 |
27 | 0.34 | Ideal | D | SI1 | 57 | 4.5 | 4.44 | 2.74 | 803 | |
28 | 0.52 | Ideal | E | VS2 | 61.5 | 56 | 5.18 | 5.2 | 3.19 | 1576 |
29 | 0.31 | Ideal | H | SI1 | 61.9 | 56 | 4.38 | 4.35 | 2.7 | 595 |
30 | 0.71 | Good | F | SI2 | 64 | 58 | 5.65 | 5.59 | 3.6 | 1818 |
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started