Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

This assignment asks you to examine the k-NN and Logistic algorithms for classification. Provide your answers to the questions in a Word document named Assign1-2_LastName.doc

This assignment asks you to examine the k-NN and Logistic algorithms for classification. Provide your answers to the questions in a Word document named Assign1-2_LastName.doc along with your source code saved as Assign1-2_LastName. If you plan to use R-Markdown, please save it as Assign1-2_LastName. Then click the Assignment title link, upload your files, and submit.

Please Download the dataset ParisHousing from this link: https://www.kaggle.com/datasets/mssmartypants/paris-housing-classification

'data.frame': 10000 obs. of 18 variables:

$ squareMeters : int 75523 80771 55712 32316 70429 39223 58682 86929 51522 39686 ...

$ numberOfRooms : int 3 39 58 47 19 36 10 100 3 42 ...

$ hasYard : chr "No" "Yes" "No" "No" ...

$ hasPool : chr "Yes" "Yes" "Yes" "No" ...

$ floors : int 63 98 19 6 90 17 99 11 61 15 ...

$ cityCode : int 9373 39381 34457 27939 38045 39489 6450 98155 9047 71019 ...

$ cityPartRange : int 3 8 6 10 3 8 10 3 8 5 ...

$ numPrevOwners : int 8 6 8 4 7 6 9 4 3 8 ...

$ made : int 2005 2015 2021 2012 1990 2012 1995 2003 2012 2021 ...

$ isNewBuilt : chr "No" "Yes" "No" "No" ... $ hasStormProtector: chr "Yes" "No" "No" "Yes" ...

$ basement : int 4313 3653 2937 659 8435 2009 5930 6326 632 5198 ...

$ attic : int 9005 2436 8852 7141 2429 4552 9453 4748 5792 5342 ...

$ garage : int 956 128 135 359 292 757 848 654 807 591 ...

$ hasStorageRoom : chr "No" "Yes" "Yes" "No" ...

$ hasGuestRoom : int 7 2 9 3 4 1 5 10 5 3 ...

$ price : num 7559082 8085990 5574642 3232561 7055052 ...

$ category : chr "Basic" "Luxury" "Basic" "Basic" ...

Explore and prepare the data by using the str() function. Display the probability of the attributes ('Basic' and 'Luxury') of the variable named "category" that you plan to use for prediction, and normalize the entire dataset. (15 pts)

Create datasets for training and testing the model, and develop the model using the k-NN classifier algorithm. Evaluate the model with different k, and propose the best value of k. Split the dataset into training and testing. The proportions of training and testing dataset will be 7:3. After partitioning the data, develop the model using the k-NN classifier algorithm, evaluate the model's performance for different K, and suggest the best model. (15 pts)

Build a logistic regression model with the same dataset and the data partition to develop the best possible machine learning algorithm to determine whether the house is Luxury or not. For this step, you need to normalize the data. Please compare the performance of the logistic regression model and the best k-NN model and provide a detailed explanation about the comparison. (70 pts)

Hint: When you build a logistic regression model, you need to recode the dependent variable in this dataset. You may consider the original dataset that contains 'L' or 'B' value as a dependent variable, and you should recode the 'category' variable as '0' or '1.' Thus, please include the recoding process of the 'category' variable. L needs to be coded 1, and B as 0. We may apply the recode() function in the 'car' package to recode values; for example,

install.packages("car") library(car) paris1<- read.csv("C:/../ParisHousingClass.csv") parisL<-data.frame(paris1) parisL$category<-recode(parisL$category, "'Luxury'=1") parisL$category<-recode(parisL$category, "'Basic'=0")

Explore and prepare data by using the str() function. Display the probability of the attributes ('Basic' and 'Luxury') of the variable named "category" that you plan to use for prediction, normalize the entire dataset, and partition the dataset into training and testing with a 7:3 ratio.

Develop the logistic regression model, and evaluate the performance. After developing the model, predict the classification of the new data below.

Suppose that you plan to build a new house for your client. The company wants to know that the new house can be classified as luxury or basic. You plan to apply the model based on the logistic regression model above. The features of new house are as follows:

squareMeters : 50000 numberOfRooms : 20 hasYard : yes
hasPool : yes floors : 50 cityCode : 9047
cityPartRange :7 hasStormProtector: no basemaent : 2009
attic: 2557 garage : 500 hasStorageRoom : yes
hasGuestRoom :1 price: 2500000 numPreOwner: 0
made : 2022 isNewBuilt : 1

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions