Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Remove missing values from your dataset(s) before answering the following questions. 1) Use mobile.csv dataset that classifies mobile phones with respect to their prices

image

Remove missing values from your dataset(s) before answering the following questions. 1) Use mobile.csv dataset that classifies mobile phones with respect to their prices into four categories (price_range column). Use 70% of the dataset for training and the remaining as the test set. a. Generate a decision tree classifier that uses gini index as the impurity measure with a maximum depth of 3. i. Plot the resulting decision tree obtained after training the classifier. ii. Compute the accuracy on the test set. iii. Plot the accuracy for different depth values (try values between 2 and 20) for both training and test. Do you observe overfitting as depth increases? b. Generate a k-nn classifier for different number of nearest neighbors, k (hyperparameter). Consider k values between 2 and 15. Plot the accuracies. C. Generate a support vector machine with a linear kernel where hyperparameter C can get values [0.001, 0.01, 0.1, 1]. Plot the accuracies for different C values. d. Generate ensemble classifiers, bagging, boosting, and adaboost, with 150 base- classifiers with a maximum depth of 5. Plot the accuracies for both training and test. Which one performs better than the others? 2) Use California Housing Dataset (data1.csv) which contains data drawn from the 1990 U.S. Census and apply K-means clustering using longitude, latitude, and median income columns. a. Pick k and form k clusters by assigning each instance to its nearest centroid. b. Compute the centroid of each cluster. C. Is the k that you pick good enough? Determine the number of clusters (k) in the data using sum-of-square-errors. 3) Use the first 50 row in the Customer Segmentation Dataset (data2.csv) and apply three hierarchical clustering algorithms provided by the Python scipy library: (1) single link (MIN), (2) complete link (MAX), and (3) group average and plot the dendograms. 4) Use data2.csv dataset (use every row) and apply DBSCAN algorithm. Consider annual income as the x axis and spending score as the y axis and plot the clusters.

Step by Step Solution

3.56 Rating (156 Votes )

There are 3 Steps involved in it

Step: 1

1 Mobile Price Classification a Decision Tree Classifier Load the mobilecsv dataset Preprocess the dataset by removing any missing values Split the dataset into a training set 70 and a test set 30 Cre... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

The Legal Environment of Business A Critical Thinking Approach

Authors: Nancy K Kubasek, Bartley A Brennan, M Neil Browne

6th Edition

978-0132666688, 132666685, 132664844, 978-0132664844

More Books

Students also viewed these Programming questions

Question

Explain relationships between trademarks and trade secrets.

Answered: 1 week ago