Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use

image text in transcribed

7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use the following for the selected model Task 1 1. Download house data from LMS and save it in your PC (working directory (folder). The data has 12 columns. Columns 1 to 11 are independent variables (X1 to XI1). Column 12 is the dependent variable (Y). o Assign random values to all Beta (B0 to B) parameters. All random values 2. MUST be real values between "0" and "I". Make SURE all parameters (B0 to B11) DO NOT have the same VALUE 3. Remove inconsistence data from some ROWS. For example, "bathrooms" values should be integers only (for example, 1, 2,4, 5, ...etc). If any row in "bathrooms" contains a real number (1.2, 3.4, 7.3,... etc), remove the complete row from the dataset. Please check other COLUMNS carefully. If you think any one of these o Use "Yest" for the predicated (estimated) value. Do not use different name. o Preform the steps of selected regression model. o If you selected Binary Logistic Regression, you need to calculate and print the accuracy. If you selected Linear Regression, you need to calculate and print the Sum of Squared Errors (SSE). 8. Apply Gradient Descent Algorithm (GDA) to optimise the selected regression model for 500 iterations. Check Algorithm 1 Pseudocode for GDA steps in Page 35 Lecture Note Week 6. Set Theta into 0.01. Use the following to calculate the partial derivative for Beta parameters: -For B0: B0-( 1 /No, of samples) * (Yest-Y) -For other parameters (B1 to B11) For i=1 to 11 columns should contains integer values only, remove the row(s) that contains real value(s). Write a code to perform the checking and removing process. Do not ask me which one should be checked or removed. Perform the following Data Exploration processes: Median: for all columns B. = ( 1 No. of samples) * (Nest-Y) *x.) 4. - Range: for all columns. - Frequency: for the following columns ONLY: "bathrooms", "floors", "condition" "grade 5. Check if the there are any "missing values" in the data. If you found "missing value" use "mean" to replace the "missing value" for real value and "min" for integer value. Check all columns value ranges. If you think normalisation is needed, use the "Min- Max" method. 6. 9 Prn the final results based on the selected regression model: either accuracy or SSE. 7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use the following for the selected model Task 1 1. Download house data from LMS and save it in your PC (working directory (folder). The data has 12 columns. Columns 1 to 11 are independent variables (X1 to XI1). Column 12 is the dependent variable (Y). o Assign random values to all Beta (B0 to B) parameters. All random values 2. MUST be real values between "0" and "I". Make SURE all parameters (B0 to B11) DO NOT have the same VALUE 3. Remove inconsistence data from some ROWS. For example, "bathrooms" values should be integers only (for example, 1, 2,4, 5, ...etc). If any row in "bathrooms" contains a real number (1.2, 3.4, 7.3,... etc), remove the complete row from the dataset. Please check other COLUMNS carefully. If you think any one of these o Use "Yest" for the predicated (estimated) value. Do not use different name. o Preform the steps of selected regression model. o If you selected Binary Logistic Regression, you need to calculate and print the accuracy. If you selected Linear Regression, you need to calculate and print the Sum of Squared Errors (SSE). 8. Apply Gradient Descent Algorithm (GDA) to optimise the selected regression model for 500 iterations. Check Algorithm 1 Pseudocode for GDA steps in Page 35 Lecture Note Week 6. Set Theta into 0.01. Use the following to calculate the partial derivative for Beta parameters: -For B0: B0-( 1 /No, of samples) * (Yest-Y) -For other parameters (B1 to B11) For i=1 to 11 columns should contains integer values only, remove the row(s) that contains real value(s). Write a code to perform the checking and removing process. Do not ask me which one should be checked or removed. Perform the following Data Exploration processes: Median: for all columns B. = ( 1 No. of samples) * (Nest-Y) *x.) 4. - Range: for all columns. - Frequency: for the following columns ONLY: "bathrooms", "floors", "condition" "grade 5. Check if the there are any "missing values" in the data. If you found "missing value" use "mean" to replace the "missing value" for real value and "min" for integer value. Check all columns value ranges. If you think normalisation is needed, use the "Min- Max" method. 6. 9 Prn the final results based on the selected regression model: either accuracy or SSE

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Pro SQL Server Administration

Authors: Peter Carter

1st Edition

1484207106, 9781484207109

More Books

Students also viewed these Databases questions

Question

What are Decision Trees?

Answered: 1 week ago

Question

What is meant by the Term Glass Ceiling?

Answered: 1 week ago