Question
Python, Only task 10&9 A national level investment company, called askaan business co. bearing logo , is looking for data scientists to help them understand
Python, Only task 10&9
A national level investment company, called askaan business co. bearing logo , is looking for data scientists to help them understand the possible patterns that will a ect the real estate prices. Currently the company purchases and sells real estates across the country. The company is interested in estimating the price of real estate after 5 years from the date of purchase. Such prediction system will help the company to invest in potential estates that will generate substantial pro t margins. The company has provided the relevant data that they have collected over the years. Following table presents an overview of the given data:
Table 1: Data Description (Any non-applicable value is set to NA)
5
5
Fields | Description |
Sale-Price | Sale Price of the property after 5 years from the date of purchase in millions of SAR. |
Purchase-Date | Month and year, when the property was purchased. |
Purchase-Price | Property's price at the time of purchase in millions of SAR. |
Type | Type of the property. The property could be open-land, villa, duplex, at. |
Class | Legal classi cation of the property, could be one of the following options: residential, indus- trial, or commercial. |
Location | Where the property is located w.r.t nearby city. 'Center' implies center of the city, 'Border' implies at the entry/exit of city, 'Outskirts' implies on the outskirts of the city. |
Shape | Shape of the property. It could be rectangle, trapezoid, irregular. |
U-Index | Index based on number of utilities available on a scale of 1 to 5. A value of 5 indicates all utilities are available. |
Proximity | Proximity to the nearest metro station in meters. |
N-Rank | Rank based on neighborhood facilities that will make the property attractive on a scale of 1 to 10. A value of 1 indicates the best neighborhood. |
P-Chance | Probability of nding parking space on adjacent roads at a given time. It is a value between 0 and 1, where 1 indicates sure availability of parking space. |
Built | Original year of construction. Applicable for villa, duplex, at. |
Renovate | Latest renovation year. Applicable for villa, duplex, at. A value of 0 implies no renovation done so far or renovation not applicable. |
Access | Type of direct access to the property, which could be street, alley or highway. |
Crime-Rate | Average number of crimes reported per year in the neighborhood. |
C-Rating | Pleasantness of the climate throughout the year on a scale of 1 to 5. A value of 5 indicates pleasant climate. |
Gov-Index | Expected level of government infrastructure project and/or developments in the neighbor- hood on a scale of 1 to 10. A value of 10 indicates that there are huge developments planned by the government. |
Contour | Flatness of the property. Applicable only for the open land type property. A value of C indicates the slope of the property is irregular. A value of F indicates the property has a smooth slope. |
Garage | Is there a private parking garage? Yes or No. Applicable to the at or duplex type. All villas have private garage. |
Swimming | Is there a swimming pool? Yes or No. Applicable to the villa type. |
Aim. The aim of this project is to explore the data, and nd possible patterns/relationships
in the data. The key variable of interest to askaan business co. is Sale-Price. Any patterns that
shows connections of input variables to the output variable (Sale-Price) will be considered fruitful
by askaan. Assume that the properties that appreciate by 100% or less over the ve years are
low potential estates, and those that appreciate by 400% or more are high potential estates. The
percentage increase or decrease is de ned as [SaleP rice][P urchaseP rice] 100. [P urchaseP rice]
Data. The data related to the project is provided in three di erent les, named in the following format: Group_XX_A, Group_XX_B and Group_XX_C les, where XX is your group number. In addition to that, Table 1 presents the meta data related to the given data.
Expectations. At the end of this project, you are expected to provide askaan with answers to the following questions. Support your answers with corresponding/appropriate data science methods and visualizations (wherever applicable).
1
Task-1: Prepare the data given in Group_XX_A le, i.e., handle the missing values, remove outliers, and x inconsistencies. You can pick any set of methods, but clearly justify your approach.
For the following task use Group_XX_B le:
-
Task-2: Draw the pair-wise plots between all the input variables and the output variable (Sale- Price).
-
Task-3: Identify top and bottom three numerical variables that are strongly related to the output variable (Sale-Price)? Use the relevant analysis approach.
-
Task-4: Show if the input variables have the information to separate low and high performing estates? Use plots to justify.
-
Task-5: What are the common patterns for the low performance of the estates? Use plots to justify.
-
Task-6: What are the common patterns for the high performance of the estates? Use plots to justify.
For the following task use Group_XX_B and Group_XX_C les:
-
Task-7: From the input and output columns given in Group_XX_B le; identify how the input variables together are related to the output. Assume that all the input variables are relevant to output variable (Sale-Price).
-
Task-8: It was observed that some of the input columns are correlated, and this may make the above analysis unreliable. Redo Task-(7), with the consideration of correlation issue between input variables.
-
Task-9: Itwasobservedthatsomeoftheinputcolumnsmaynotberelevanttotheoutputvariable, and this may make the above analysis unreliable. Redo Task-(7), with the consideration of possible unrelated input variables.
-
Task-10: Predict the estimated Sale-Price values given in Group_XX_C le. Consider all the numerical and categorical variables for the analysis. If you skip any column, then provide strong justi cation. Also, justify your transformation and modi cation of the columns for the analysis.
5
For the following task use Group_XX_A le:
2
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started