Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

AA1.csv Download the csv file .Write a script in python (you can use any IDE) to load the csv file AA1into a Pandas data frame,

AA1.csv

Download the csv file .Write a script in python (you can use any IDE) to load the csv file AA1into a Pandas data frame, name the frame df_firstname(where firstname is your firrstname). In your script carry out the following, and then answer the last set of questions in point 5 Analysis in the html box:

(Note: Once your script is ready please attach the python script and the required screenshot(s) to this question by clicking the "Add file" button and then follow the notes to upload your script).

Explore the data

  1. Print the names of columns
  2. Print the types of columns
  3. Print the unique values in each column.
  4. Print the statistics count, min, mean, standard deviation, 1stquartile, median, 3rdquartile max of all the numeric columns(use one command).
  5. Print the first four records.
  6. Print a summary of all missing values in all columns (use one command).
  7. Print the total number (count) of each unique value in the following categorical columns:
    1. Model
    2. Color
  8. Visualize the data
    1. Plot a histogram for themillageuse 10 bins, name the x and y axis' appropriately, give the plot a title "firstname_millage".
    2. Create a scatterplot showing "millage" versus "value", name the x and y axis' appropriately, give the plot a title "firstname_millage_scatter".
    3. Plot a "scatter matrix" showing the relationship between all columns of the dataset on the diagonal of the matrix plot the kernel density function.
  9. Pre-process the data
    1. Remove (drop) properly the column with the most missing values. (hint: make sure you review and set the right arguments)
    2. Replace the missing values in the "millage" column with the mean average of the column value.
    3. Check that there are no missing values.
    4. Convert the all the categorical columns into numeric values and drop/delete the original columns. (hint: use get dummies)
    5. Make sure your new data frame is completely numeric, name it df_firstname_numeric.
  10. Build a model and validate
    1. Build a predictive model, namely a tree classifier using sklearn take into consideration the following:
    2. Name the model dt_firstname where firstname is your firstname
    3. Split your data 70% for training and 30% for testing
    4. Use entropy for the decisions
    5. Maximum depth of the tree is 6
    6. Split the node only when you reach 15 observations per node.
    7. For validation use 8 -fold cross validation and print the mean of accuracy of the validation.
    8. Use the model you created using the training data to test the 30% testing data, print :
      1. The accuracy of the test
      2. The confusion matrix
    9. Take a screenshot illustrating the accuracy of the test and the confusion matrix name it firstname_screenshotAA1.
    10. Prune the tree: Vary the maximum depth of your predictive model from 1 to 8 and print the mean accuracy of the k-fold of each run on the training data.

5. Analysis

In the below box answer the following three questions, number your responses based on the question numbers:

  1. What are the key highlights of the original dataset, you loaded.
  2. Based on the results of pruning the tree recommend the maximum depth and explain why you are recommending such.
  3. Looking at the confusion matrix you generated in point 4.7 what are the key findings (Hint: think in terms of precision, re-call, True negatives,.....)?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Macroeconomics Policy And Practice

Authors: Frederic Mishkin

2nd Edition

0133424316, 978-0133424317

More Books

Students also viewed these Economics questions