Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

# Week 9 - Exercise ### Notebook created by Jonathan Penava For this exercise you are going to try to try and predict the carat

# Week 9- Exercise
### Notebook created by Jonathan Penava
For this exercise you are going to try to try and predict the carat range of a diamond based on some of its properties. Start by importing the standard libraries and reading in the 'diamonds.csv' file.
Drop the 'Color', 'Clarity', and 'Unnamed: 0' Columns.
You will need to change the 'cut' values into int values.
Referencing what we did in week 5, create a new column for carat ranges with the values '0-1','1-2','2-3','3-4','4-5','5-6'
Drop the column 'carat' after you have created the carat ranges column
Using K-Nearest Neighbours classify your carat ranges based on the other values in your dataframe. Print out an accuracy score and a classification report for your algorithm.
Create a diagram of error rates for different values of K. What value of K will give you a more accurate result?
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
diamonds_df = pd.read_csv('diamonds.csv')
diamonds_df.drop(columns=['color', 'clarity', 'Unnamed: 0'], inplace=True)
cut_mapping ={'Fair': 1, 'Good': 2, 'Very Good': 3, 'Premium': 4, 'Ideal': 5}
diamonds_df['cut']= diamonds_df['cut'].map(cut_mapping)
carat_bins =[0,1,2,3,4,5,6]
carat_labels =['0-1','1-2','2-3','3-4','4-5','5-6']
diamonds_df['carat_range']= pd.cut(diamonds_df['carat'], bins=carat_bins, labels=carat_labels)
diamonds_df.drop(columns=['carat'], inplace=True)
X = diamonds_df.drop(columns=['carat_range'])
y = diamonds_df['carat_range']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
knn = KNeighborsClassifier(n_neighbors=5) # You can change the value of K
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy:.2f}")
print(classification_report(y_test, y_pred))
error_rates =[]
k_values = range(1,21)
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
error_rate =1- accuracy_score(y_test, y_pred)
error_rates.append(error_rate)
plt.plot(k_values, error_rates, marker='o')
plt.xlabel('K Value')
plt.ylabel('Error Rate')
plt.title('Error Rates for Different K Values')
plt.show() According to the above week 9 exercise do the following
Week 10- Exercise
Notebook created by Jonathan Penava
B
For this exercise you are going to modify your week 9 exercise to be a web service.
Have a sample URL that will return an example of JSON input.
Have an evaluate URL that will return the carat range based on the input.
Create a separate notebook (Notes Part 3) that will test out your URL's

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions