Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Use the following data to do the next questions. In this assignment, you will be building a linear regression model, and you will test your

Use the following data to do the next questions. In this assignment, you will be building a linear regression model, and you will test your model.

In [0]:

display(dbutils.fs.ls('/databricks-datasets/wine-quality'))

Q1

Import the file winequality-white.csv as a Spark DataFrame. Make sure to properly parse data into multiple columns, and assign right schema values.

Q2

Show the distinct values of quality column. This is your label column.

Q3

Split the data into train and test sets. Use 42 as your seed number to be able to reproduce the same results next time the notebook gets run.

Q4

Select a couple of the columns from the dataset to build a feature vector to prepare linear regression. Use VectorAssembler to transform your train dataset. Show a few rows of the output of the VectorAssembler.

Q5

Apply linear regression to your train set. Show model coefficients and the intercept.

Q6

Transform your train dataset with the linear model and show quality and prediction. Show only 10 rows.

Q7

Evaluate your model with RegressionEvaluator. Show Root Mean Square Error (RMSE) and R-squared value to demonstrate model's performance.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning C# 2005 Databases

Authors: Karli Watson

1st Edition

0470044063, 978-0470044063

More Books

Students also viewed these Databases questions