Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The purpose of the Data Analysis Project is to work with a team of your peers on a multi - week analysis project comprised of

The purpose of the Data Analysis Project is to work with a team of your peers on a multi-week analysis project comprised of several smaller assignments. You will be placed in your teams by the end of Week 1.
Instructions
For the data analysis project your team will be working with the Ames Housing dataset from Kaggle. Follow this link to see a full description of the dataset.
This Excel data file called Ames Housing Dataset.xlsx is the same as the train.csv data file on Kaggle. A brief description of the data fields is available under Data on the competition page. You can find a detailed description of all data fields in the data_description.txt file if you sign up for a free account with Kaggle.
Ames Housing Dataset
Part I. Assigned Week 1, due by the end of Week 2
Start with downloading the data on your computer and getting familiar with the data set. Use Excel to answer the questions below. Use a separate worksheet in your Excel file to show your work for each question. Label worksheets using Q_# format. Report your findings in a provided Word document template, and include all the relevant output from Excel. Submit both your Excel file and your written project report as a Word document.
List all the quantitative variables in the order they appear in the dataset in a separate worksheet named Q1.
In the worksheet named Data, create two new variables: house age at the time of sale, HouseAgeSale, and number of bathrooms, BRTotal. Show the formulas you use to create the new variables.
Copy the two columns with the new variables into a new worksheet, Q2. Calculate the mean and standard deviation for HouseAgeSale and BRTotal.
Take a random sample of 250 observations, and save it in a separate worksheet labeled Q3_Sample.>
From this point on, continue working with your random sample of 250 observations. Construct a histogram for SalePrice. Take a natural log of SalePrice and name the new variable LogPrice. Construct a histogram for LogPrice. Label the axes clearly and give titles to your histograms. Comment on the shape of the distribution for each variable. What do you observe after the log transformation?
Create a scatterplot for GrLivArea and SalePrice. Describe the relationship between the two variables, point out any unusual features you might observe.
Choose a categorical variable with at least four categories and construct a bar chart for it. Make sure your bar chart has a legend and is easy to read.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Larry Ellison Database Genius Of Oracle

Authors: Craig Peters

1st Edition

0766019748, 978-0766019744

More Books

Students also viewed these Databases questions

Question

Persuasive Speaking Organizing Patterns in Persuasive Speaking?

Answered: 1 week ago