Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Business Scenario: Second Hand Car Sales A large Australian second hand car dealer Pre-Loved Cars (P-LC), with dealerships across all Australian states, asked you to

Business Scenario: Second Hand Car Sales A large Australian second hand car dealer Pre-Loved Cars (P-LC), with dealerships across all Australian states, asked you to develop a method of estimating a purchase price of any new car brought into their dealerships. At the moment, each car is described using structured data only, which is based on the car detailed evaluation in one of their workshops. However, in the future P-LC would like to pro-actively seek business opportunities by identifying prospective clients personal advertising placed on social media. P-LC provided you with data of past car evaluations and would like you to clean-up and explore car data, develop and evaluate a model predicting their prices (label), and minimize the classification or estimation error in the process. They have already undertaken some model development and attached the preliminary results for your comment. In several cases, the original numeric label was discretised. Data P-LC provided you with a sample of 55,870 car evaluations, which include 13 attributes: car ID, brand, model and their popular name (title), type of discount given, odometer/mileage reading (kilometres), body type, transmission, engine, state where the acquisition was made (e.g. Victoria), seller type (e.g. Private Seller), year of car manufacturing and price (label). P-LC are also planning to include a new attribute advert to include text sourced from social media. Some attributes have missing values or outliers. Charts and tables The following charts and tables will assist you in your tasks.

In the context of the Classification Performance, define these concepts and explain how they are used in the evaluation of a classifier: A. Classification Error, B. Kappa, C. Recall, D. False positive rate.

Provide the following information:

  • Concept A is defined as and is used to
  • Concept B is defined as and is used to
  • Concept C is defined as and is used to
  • Concept D is defined as and is used to
  • image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed
Charts and tables The following charts and tables will assist you in your tasks. Figure 3 (Part C): Cluster Cumulative Variance Plot Figure 1 (Part C): Data set attributes (text attribute "advert" to be included in the future) Figure 1 (Part B): Purchased cars in different states, by the year of their manufacture Figure 4 (Part A): Class distribution of the label attribute, discretised at $50K \begin{tabular}{|l|} \hline k-NN k=7 Performance \\ accuracy: 94.44%+/0.13% (micro average: 94.44% ) \\ kappa: 0.470+/0.007 (micro average: 0.470 ) \\ AUC: 0.958+/0.003 (micro average: 0.958 ) (positive class: expensive) \\ \hline Decision Tree Performance \\ accuracy: 93.30%+/0.24% (micro average: 93.30% ) \\ kappa: 0.694+/0.026 (micro average: 0.694 ) \\ AUC: 0.968+/0.008 (micro average: 0.968 ) (positive class: expensive) \\ \hline Logistic Regression \\ accuracy: 93.90%+/0.22% (micro average: 93.90% ) \\ kappa: 0.437+/0.029 (micro average: 0.437 ) \\ AUC: 0.929+/0.005 (micro average: 0.929 ) (positive class: expensive) \\ \hline \end{tabular} Figure 4 (Part B): Performance of three classification models Figure 3 (Part A): k-Means Centroid Chart (with selected attributes) Figure 3 (Part B): Cluster Scatter Plot with SVD Figure 2 (Part C): Distribution of resduals (left) and predicted (blue continuous) vs. actual (red dashed) label values (right) elimination of collinear features, and ridge regularisation in use) \begin{tabular}{|l} Performancevector: \\ root_mean_squared_error: 11572.666+/703.127 (micro average: 11591.869+/0.000 ) \\ absolute_error: 7225.007+/149.505 (micro average: 7225.007+/9064.806 ) \\ relative_error: 42.55%+/0.64% (micro average: 42.55%+/73.47% ) \\ correlation: 0.745+/0.020 (micro average: 0.744 ) \\ squared_correlation: 0.556+/0.030 (micro average: 0.554 ) \\ prediction_average: 24609.882+/214.859 (micro average: 24609.882+/17359.467 ) \end{tabular} Figure 2 (Part B): Performance of linear regression

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Accounting questions