Answered step by step
Verified Expert Solution
Question
1 Approved Answer
analyzing a messy dataset import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns Reading in the data Set
analyzing a messy dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Reading in the data
Set the variable salesdf to a Pandas DataFrame from the RealEstateSalesGLcsv file and display the first rows.
Note: This is a fairly large dataset, so when you call the function to read in the file, add the argument lowmemoryFalse.
salesdf
salesdfhead
Cleaning up the data
For our analysis, we will want to do a breakdown of real estate sales by year of sale, town, assessed value, sale amount, and property type.
Removing all the unnecessary columns from salesdf Hint: we are interested in year of sale, not the year the property was listed.
Change the dtypes for Assessed Value and Sale Amount to int.
salesdfhead
Display how many values in each column are NaN. This does not need to be saved as a variable. Hint: a function that results in a boolean can be chained with the sum function to get a count.
We need to extract the year the property was sold.
Remove any row with a NaN.
Set the variable yearsold to a list of years as integers extracted from the Date Recorded column.
Add a new column named Year Sold for those data.
salesdfhead
Let's get an overview of the Sale Amount data. Use the Seaborn package to create a scatter plot of Sale Amount as a function of Year Sold.
pltshow
The data appear squashed because of an outlier. Display the outliers by returning rows from salesdf that have Sale Amount values greater than e You do not need to save this as a new variable
Let's remove the outlier and then calculate the average property sale price by year since the housing market crash in
Remove the outlier from salesdf
Set a new variable meansdf to a DataFrame containing the mean sale and assessment values by year.
Using Matplotlib, create a single figure with two lineplots:
Grey line plot with circle markers of mean assessment value by year from to
Steelblue line plot with square markers of mean sale price by year from to
Title the graph "Comparison of Average Property Values and Sale Prices from to
Include x and yaxis labels
Include a legend with labels tht correspond to the column headers
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started