Answered step by step
Verified Expert Solution
Question
1 Approved Answer
7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use
7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use the following for the selected model Task 1 1. Download house data from LMS and save it in your PC (working directory (folder). The data has 12 columns. Columns 1 to 11 are independent variables (X1 to XI1). Column 12 is the dependent variable (Y). o Assign random values to all Beta (B0 to B) parameters. All random values 2. MUST be real values between "0" and "I". Make SURE all parameters (B0 to B11) DO NOT have the same VALUE 3. Remove inconsistence data from some ROWS. For example, "bathrooms" values should be integers only (for example, 1, 2,4, 5, ...etc). If any row in "bathrooms" contains a real number (1.2, 3.4, 7.3,... etc), remove the complete row from the dataset. Please check other COLUMNS carefully. If you think any one of these o Use "Yest" for the predicated (estimated) value. Do not use different name. o Preform the steps of selected regression model. o If you selected Binary Logistic Regression, you need to calculate and print the accuracy. If you selected Linear Regression, you need to calculate and print the Sum of Squared Errors (SSE). 8. Apply Gradient Descent Algorithm (GDA) to optimise the selected regression model for 500 iterations. Check Algorithm 1 Pseudocode for GDA steps in Page 35 Lecture Note Week 6. Set Theta into 0.01. Use the following to calculate the partial derivative for Beta parameters: -For B0: B0-( 1 /No, of samples) * (Yest-Y) -For other parameters (B1 to B11) For i=1 to 11 columns should contains integer values only, remove the row(s) that contains real value(s). Write a code to perform the checking and removing process. Do not ask me which one should be checked or removed. Perform the following Data Exploration processes: Median: for all columns B. = ( 1 No. of samples) * (Nest-Y) *x.) 4. - Range: for all columns. - Frequency: for the following columns ONLY: "bathrooms", "floors", "condition" "grade 5. Check if the there are any "missing values" in the data. If you found "missing value" use "mean" to replace the "missing value" for real value and "min" for integer value. Check all columns value ranges. If you think normalisation is needed, use the "Min- Max" method. 6. 9 Prn the final results based on the selected regression model: either accuracy or SSE. 7. Apply a proper regression model- either Binary Logistic Regression or Linear Regression. You need to think which one best fit with this dataset. Use the following for the selected model Task 1 1. Download house data from LMS and save it in your PC (working directory (folder). The data has 12 columns. Columns 1 to 11 are independent variables (X1 to XI1). Column 12 is the dependent variable (Y). o Assign random values to all Beta (B0 to B) parameters. All random values 2. MUST be real values between "0" and "I". Make SURE all parameters (B0 to B11) DO NOT have the same VALUE 3. Remove inconsistence data from some ROWS. For example, "bathrooms" values should be integers only (for example, 1, 2,4, 5, ...etc). If any row in "bathrooms" contains a real number (1.2, 3.4, 7.3,... etc), remove the complete row from the dataset. Please check other COLUMNS carefully. If you think any one of these o Use "Yest" for the predicated (estimated) value. Do not use different name. o Preform the steps of selected regression model. o If you selected Binary Logistic Regression, you need to calculate and print the accuracy. If you selected Linear Regression, you need to calculate and print the Sum of Squared Errors (SSE). 8. Apply Gradient Descent Algorithm (GDA) to optimise the selected regression model for 500 iterations. Check Algorithm 1 Pseudocode for GDA steps in Page 35 Lecture Note Week 6. Set Theta into 0.01. Use the following to calculate the partial derivative for Beta parameters: -For B0: B0-( 1 /No, of samples) * (Yest-Y) -For other parameters (B1 to B11) For i=1 to 11 columns should contains integer values only, remove the row(s) that contains real value(s). Write a code to perform the checking and removing process. Do not ask me which one should be checked or removed. Perform the following Data Exploration processes: Median: for all columns B. = ( 1 No. of samples) * (Nest-Y) *x.) 4. - Range: for all columns. - Frequency: for the following columns ONLY: "bathrooms", "floors", "condition" "grade 5. Check if the there are any "missing values" in the data. If you found "missing value" use "mean" to replace the "missing value" for real value and "min" for integer value. Check all columns value ranges. If you think normalisation is needed, use the "Min- Max" method. 6. 9 Prn the final results based on the selected regression model: either accuracy or SSE
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started