Answered step by step
Verified Expert Solution
Question
1 Approved Answer
need accurate answers for all questions please Please do your work independently and do not copy or distribute the exam sheet, thank you! - Let
need accurate answers for all questions please
Please do your work independently and do not copy or distribute the exam sheet, thank you! - Let x1 be the last digit of your student ID, x2 be the second last digit, and so on. - x1=_,x2=,x3=,x4=_,x5=,x6=,x7=. - Example ID: 7654321 , then x1=1,x2=2,x3=3,x4=4,x5=5,x6=6,x7=7. - Use the values of x1,,x7 for all the hands-on assignment below. - For all of the following questions, to get point, please Submit 1) the answers to the following questions and 2) corresponding ipython notebook files to CANVAS assignment section "Final Hands on Q1, Q2, Q3". 1. [10 pts] Spark SQL. Input file: Taking every (15+x1)-th sample of the minute_weather.csv If we impute the missing values from the air pressure at 9 am column with average value, how many air pressure at 9 am measurements have values between (910+x2) and (920+x3) ? 2. [10 pts] Spark Decision Tree Classification and Evaluation. Input file: Taking every (15+x4)-th sample of the minute_weather.csv If we perform decision tree classifier with train-test split ratio set as 0.7 and 0.3, seed = (x5+123), maximum depth of the tree set as (x6+6), what is the false positive rate after classification? 3. [10 pts] Spark k-mean Clustering. Input file: daily_weather.csv If we perform clustering with (x7+8) clusters (and seed =x1+10 ), which cluster appears to identify Santa Ana conditions (lowest humidity and highest wind speeds) Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started