Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The code runs free of errors but always gives Nan correlation. i dont understand what the problem is , i tried every solution but keeps

The code runs free of errors but always gives Nan correlation. i dont understand what the problem is, i tried every solution but keeps producing same output. i have attached the 4 datasets i have to make the code and create a correlation for the four hupothesis and even attach the result. my data frame is always empty no matter how i edit the code. there is no null values or string values in my data set as well . help me and give me a code free of error and give me right correlation. removing rows, merging, eberything is done but still gives Nan correlation. give the correct code, no need for explanation if the code still gives Nan output.
import pandas as pd
import numpy as np
from scipy.stats import pearsonr
import matplotlib.pyplot as plt
# Read data files
house_prices_df = pd.read_excel("MeanHousePricesClean-1.xlsx")
crime_df = pd.read_excel("CrimeClean-1-1.xlsx")
population_df = pd.read_excel("PopulationClean.xlsx")
area_df = pd.read_excel("SuburbAreas-1.xlsx", header=None)
# Step B: Clean and prepare data
def prepare_data(df, columns):
df = df.dropna(subset=columns) # Remove rows with missing values in key columns
return df
# Rename columns for consistency
house_prices_df = house_prices_df.rename(columns={'Year': 'year'})
crime_df = crime_df.rename(columns={'Year': 'year', 'Crime rate per 100,000 population': 'crime_rate',
'Local Government Area': 'local_government_area'})
population_df = population_df.rename(columns={'Year': 'year'})
area_df = area_df.rename(columns={0: 'local_government_area', 1: 'area'})
# Clean the area DataFrame to remove non-relevant rows
area_df = area_df[area_df['local_government_area']!= 'Property']
# Step C: Analysis functions
def analyze_correlation(df, col1, col2):
df = df.dropna(subset=[col1, col2])
if len(df)2:
return np.nan
correlation, _= pearsonr(df[col1], df[col2])
return correlation
# Reshape data to long format
house_prices_long = pd.melt(house_prices_df, id_vars=['year'], var_name='local_government_area', value_name='mean_house_price')
population_long = pd.melt(population_df, id_vars=['year'], var_name='local_government_area', value_name='population')
# Merge the datasets on 'year' and 'local_government_area'
merged_df = pd.merge(crime_df, house_prices_long, on=['year', 'local_government_area'], how='inner')
merged_df = pd.merge(merged_df, population_long, on=['year', 'local_government_area'], how='inner')
merged_df = pd.merge(merged_df, area_long, on='local_government_area', how='inner')
# Calculate population density
merged_df['population_density']= merged_df['population']/ merged_df['area']
# Step D: Prepare the data by cleaning
merged_df = prepare_data(merged_df,['mean_house_price', 'crime_rate', 'population_density'])
# Step E: Perform correlation analysis
house_price_population_corr = analyze_correlation(merged_df, 'mean_house_price', 'population_density')
crime_house_price_corr = analyze_correlation(merged_df, 'crime_rate', 'mean_house_price')
crime_population_density_corr = analyze_correlation(merged_df, 'crime_rate', 'population_density')
# Step F: Print the results
print(f"Correlation between house prices and population density: {house_price_population_corr}")
print(f"Correlation between crime rate and house prices: {crime_house_price_corr}")
print(f"Correlation between crime rate and population density: {crime_population_density_corr}")
# Plotting for visual analysis
plt.figure(figsize=(10,6))
plt.scatter(merged_df['population_density'], merged_df['mean_house_price'])
plt.title('House Price vs Population Density')
plt.xlabel('Population Density (people per square km)')
plt.ylabel('Mean House Price')
plt.grid(True)
plt.show()
plt.figure(figsize=(10,6))
plt.scatter(merged_df['mean_house_price'], merged_df['crime_rate'])
plt.title('Crime Rate vs House Price')
plt.xlabel('Mean House Price')
plt.ylabel('Crime Rate (per 100,000 population)')
plt.grid(True)
plt.show()
plt.figure(figsize=(10,6))
plt.scatter(merged_df['population_density'], merged_df['crime_rate'])
plt.title('Crime Rate vs Population Density')
plt.xlabel('Population Density (people per square km)')
plt.ylabel('Crime Rate (per 100,000 population)')
plt.grid(True)
plt.show()
output the above code gives: Correlation between house prices and population density: nan
Correlation between crime rate and house prices: nan
Correlation between crime rate and population density: nan
problem: Empty DataFrame
Columns: [Incidents recorded, crime_rate, mean_house_price, year, population, local_government_area, area, population_density]
Index: []
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Big Data Concepts, Theories, And Applications

Authors: Shui Yu, Song Guo

1st Edition

3319277634, 9783319277639

More Books

Students also viewed these Databases questions

Question

Explain the factors that determine the degree of decentralisation

Answered: 1 week ago

Question

What Is acidity?

Answered: 1 week ago

Question

Explain the principles of delegation

Answered: 1 week ago

Question

State the importance of motivation

Answered: 1 week ago

Question

Discuss the various steps involved in the process of planning

Answered: 1 week ago

Question

=+2. Explain the interactions in the newspaper and magazine market!

Answered: 1 week ago