Question
This exercises some basic Scala and some basic Spark Given the training set : X ( age: Double) ,Y( yearly visits to parks!: Double) X
This exercises some basic Scala and some basic Spark
Given the training set : X ( age: Double) ,Y( yearly visits to parks!: Double)
X = (10 5 1 6 7 3 4 5 1 8)
Y = (2 4 4 2 4 5 4 5 6 4 )
first write your own file with this data, csv or text file
you can use Scala.io to read in the file in Scala and "spark.read. .... " using Spark
See the cookbook !
PART I : Do a Scala Analysis and compute --- regression coefficient, intercept, SST, SSR, SSE, Correlation coef, R, R^2, angle between x y . Draw the statistical triangle and identify the legs.
PART II Do a Spark analysis of this data set
0. //optional //Construct your schema to match the incoming data
1. Read in your file ( which will give you a DataFrame (DF).
2. Convert DatasFrame to a Dataset ( see ch 11 of Spark Guide)
3. Construct a Vector assembler to convert the age column to a features vector
( can you use basic Spark datatypes to do this manually)?
4. Make sure the "visits" column is called "label" ( that is, get your DS ready for regression)!
5. Do a Spark regression and verify the Scala values earlier calculated.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started