Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The purpose of the Data Analysis Project is to work with a team of your peers on a multi - week analysis project comprised of
The purpose of the Data Analysis Project is to work with a team of your peers on a multiweek analysis project comprised of several smaller assignments. You will be placed in your teams by the end of Week
Instructions
For the data analysis project your team will be working with the Ames Housing dataset from Kaggle. Follow this link to see a full description of the dataset.
This Excel data file called Ames Housing Dataset.xlsx is the same as the traincsv data file on Kaggle. A brief description of the data fields is available under Data on the competition page. You can find a detailed description of all data fields in the datadescription.txt file if you sign up for a free account with Kaggle.
Ames Housing Dataset
Part I. Assigned Week due by the end of Week
Start with downloading the data on your computer and getting familiar with the data set. Use Excel to answer the questions below. Use a separate worksheet in your Excel file to show your work for each question. Label worksheets using Q# format. Report your findings in a provided Word document template, and include all the relevant output from Excel. Submit both your Excel file and your written project report as a Word document.
List all the quantitative variables in the order they appear in the dataset in a separate worksheet named Q
In the worksheet named Data, create two new variables: house age at the time of sale, HouseAgeSale, and number of bathrooms, BRTotal. Show the formulas you use to create the new variables.
Copy the two columns with the new variables into a new worksheet, Q Calculate the mean and standard deviation for HouseAgeSale and BRTotal.
Take a random sample of observations, and save it in a separate worksheet labeled QSample.
From this point on continue working with your random sample of observations. Construct a histogram for SalePrice. Take a natural log of SalePrice and name the new variable LogPrice. Construct a histogram for LogPrice. Label the axes clearly and give titles to your histograms. Comment on the shape of the distribution for each variable. What do you observe after the log transformation?
Create a scatterplot for GrLivArea and SalePrice. Describe the relationship between the two variables, point out any unusual features you might observe.
Choose a categorical variable with at least four categories and construct a bar chart for it Make sure your bar chart has a legend and is easy to read.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started