Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python, Only task 10&9 A national level investment company, called askaan business co. bearing logo , is looking for data scientists to help them understand

Python, Only task 10&9

A national level investment company, called askaan business co. bearing logo , is looking for data scientists to help them understand the possible patterns that will a ect the real estate prices. Currently the company purchases and sells real estates across the country. The company is interested in estimating the price of real estate after 5 years from the date of purchase. Such prediction system will help the company to invest in potential estates that will generate substantial pro t margins. The company has provided the relevant data that they have collected over the years. Following table presents an overview of the given data:

Table 1: Data Description (Any non-applicable value is set to NA)

5

5

Fields

Description

Sale-Price

Sale Price of the property after 5 years from the date of purchase in millions of SAR.

Purchase-Date

Month and year, when the property was purchased.

Purchase-Price

Property's price at the time of purchase in millions of SAR.

Type

Type of the property. The property could be open-land, villa, duplex, at.

Class

Legal classi cation of the property, could be one of the following options: residential, indus- trial, or commercial.

Location

Where the property is located w.r.t nearby city. 'Center' implies center of the city, 'Border' implies at the entry/exit of city, 'Outskirts' implies on the outskirts of the city.

Shape

Shape of the property. It could be rectangle, trapezoid, irregular.

U-Index

Index based on number of utilities available on a scale of 1 to 5. A value of 5 indicates all utilities are available.

Proximity

Proximity to the nearest metro station in meters.

N-Rank

Rank based on neighborhood facilities that will make the property attractive on a scale of 1 to 10. A value of 1 indicates the best neighborhood.

P-Chance

Probability of nding parking space on adjacent roads at a given time. It is a value between 0 and 1, where 1 indicates sure availability of parking space.

Built

Original year of construction. Applicable for villa, duplex, at.

Renovate

Latest renovation year. Applicable for villa, duplex, at. A value of 0 implies no renovation done so far or renovation not applicable.

Access

Type of direct access to the property, which could be street, alley or highway.

Crime-Rate

Average number of crimes reported per year in the neighborhood.

C-Rating

Pleasantness of the climate throughout the year on a scale of 1 to 5. A value of 5 indicates pleasant climate.

Gov-Index

Expected level of government infrastructure project and/or developments in the neighbor- hood on a scale of 1 to 10. A value of 10 indicates that there are huge developments planned by the government.

Contour

Flatness of the property. Applicable only for the open land type property. A value of C indicates the slope of the property is irregular. A value of F indicates the property has a smooth slope.

Garage

Is there a private parking garage? Yes or No. Applicable to the at or duplex type. All villas have private garage.

Swimming

Is there a swimming pool? Yes or No. Applicable to the villa type.

Aim. The aim of this project is to explore the data, and nd possible patterns/relationships

in the data. The key variable of interest to askaan business co. is Sale-Price. Any patterns that

shows connections of input variables to the output variable (Sale-Price) will be considered fruitful

by askaan. Assume that the properties that appreciate by 100% or less over the ve years are

low potential estates, and those that appreciate by 400% or more are high potential estates. The

percentage increase or decrease is de ned as [SaleP rice][P urchaseP rice] 100. [P urchaseP rice]

Data. The data related to the project is provided in three di erent les, named in the following format: Group_XX_A, Group_XX_B and Group_XX_C les, where XX is your group number. In addition to that, Table 1 presents the meta data related to the given data.

Expectations. At the end of this project, you are expected to provide askaan with answers to the following questions. Support your answers with corresponding/appropriate data science methods and visualizations (wherever applicable).

1

Task-1: Prepare the data given in Group_XX_A le, i.e., handle the missing values, remove outliers, and x inconsistencies. You can pick any set of methods, but clearly justify your approach.

For the following task use Group_XX_B le:

  1. Task-2: Draw the pair-wise plots between all the input variables and the output variable (Sale- Price).

  2. Task-3: Identify top and bottom three numerical variables that are strongly related to the output variable (Sale-Price)? Use the relevant analysis approach.

  3. Task-4: Show if the input variables have the information to separate low and high performing estates? Use plots to justify.

  4. Task-5: What are the common patterns for the low performance of the estates? Use plots to justify.

  5. Task-6: What are the common patterns for the high performance of the estates? Use plots to justify.

For the following task use Group_XX_B and Group_XX_C les:

  1. Task-7: From the input and output columns given in Group_XX_B le; identify how the input variables together are related to the output. Assume that all the input variables are relevant to output variable (Sale-Price).

  2. Task-8: It was observed that some of the input columns are correlated, and this may make the above analysis unreliable. Redo Task-(7), with the consideration of correlation issue between input variables.

  3. Task-9: Itwasobservedthatsomeoftheinputcolumnsmaynotberelevanttotheoutputvariable, and this may make the above analysis unreliable. Redo Task-(7), with the consideration of possible unrelated input variables.

  4. Task-10: Predict the estimated Sale-Price values given in Group_XX_C le. Consider all the numerical and categorical variables for the analysis. If you skip any column, then provide strong justi cation. Also, justify your transformation and modi cation of the columns for the analysis.

5

For the following task use Group_XX_A le:

2

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Contemporary Issues In Database Design And Information Systems Development

Authors: Keng Siau

1st Edition

1599042894, 978-1599042893

More Books

Students also viewed these Databases questions

Question

2. How do business reports differ from business letters?

Answered: 1 week ago