Question
Use the following data to do the next questions. In this assignment, you will be building a linear regression model, and you will test your
Use the following data to do the next questions. In this assignment, you will be building a linear regression model, and you will test your model.
In [0]:
display(dbutils.fs.ls('/databricks-datasets/wine-quality'))
Q1
Import the file winequality-white.csv as a Spark DataFrame. Make sure to properly parse data into multiple columns, and assign right schema values.
Q2
Show the distinct values of quality column. This is your label column.
Q3
Split the data into train and test sets. Use 42 as your seed number to be able to reproduce the same results next time the notebook gets run.
Q4
Select a couple of the columns from the dataset to build a feature vector to prepare linear regression. Use VectorAssembler to transform your train dataset. Show a few rows of the output of the VectorAssembler.
Q5
Apply linear regression to your train set. Show model coefficients and the intercept.
Q6
Transform your train dataset with the linear model and show quality and prediction. Show only 10 rows.
Q7
Evaluate your model with RegressionEvaluator. Show Root Mean Square Error (RMSE) and R-squared value to demonstrate model's performance.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started