Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

analyzing a messy dataset import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns Reading in the data Set

analyzing a messy dataset
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Reading in the data
Set the variable sales_df to a Pandas DataFrame from the Real_Estate_Sales_2001-2020_GL.csv file and display the first 5 rows.
Note: This is a fairly large dataset, so when you call the function to read in the file, add the argument low_memory=False.
sales_df =
sales_df.head()
Cleaning up the data
For our analysis, we will want to do a breakdown of real estate sales by year of sale, town, assessed value, sale amount, and property type.
Removing all the unnecessary columns from sales_df. Hint: we are interested in year of sale, not the year the property was listed.
Change the dtypes for Assessed Value and Sale Amount to int.
sales_df.head()
Display how many values in each column are NaN. This does not need to be saved as a variable. Hint: a function that results in a boolean can be chained with the sum() function to get a count.
We need to extract the year the property was sold.
Remove any row with a NaN.
Set the variable year_sold to a list of years (as integers) extracted from the Date Recorded column.
Add a new column named Year Sold for those data.
sales_df.head()
Let's get an overview of the Sale Amount data. Use the Seaborn package to create a scatter plot of Sale Amount as a function of Year Sold.
plt.show()
The data appear squashed because of an outlier. Display the outliers by returning rows from sales_df that have Sale Amount values greater than 1e9. You do not need to save this as a new variable
Let's remove the outlier and then calculate the average property sale price by year since the housing market crash in 2008.
Remove the outlier from sales_df.
Set a new variable means_df to a DataFrame containing the mean sale and assessment values by year.
Using Matplotlib, create a single figure with two lineplots:
Grey line plot with circle markers of mean assessment value by year from 2008 to 2021
Steelblue line plot with square markers of mean sale price by year from 2008 to 2021
Title the graph "Comparison of Average Property Values and Sale Prices from 2008 to 2021"
Include x- and y-axis labels
Include a legend with labels tht correspond to the column headers

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

From Herds To Insights Harnessing Data Analytics For Sustainable Livestock Farming

Authors: Prof Suresh Neethirajan

1st Edition

B0CFD6K6KK, 979-8857075487

More Books

Students also viewed these Databases questions