Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

using scala In-class lab Introduction: Machine learning algorithms are very commonly being deployed these days in industry. These algorithms do not have a one size

using scala In-class lab Introduction: Machine learning algorithms are very commonly being deployed these days in industry. These algorithms do not have a one size fits all approach to them. There are multiple levers for any algorithm both in overall design as well as internally. Systems wise, we can choose whether to split our data into a training and test set and the amount of features engineering that we want to do. Internally, algorithms can be adjusted by a set of hyperparameters. These concepts will be explored in this project using SparkML. Problem Definition and Dataset Descriptions There are two different problems, choose ONE to solve. Use whatever algorithm you seem appropriate. 1. Problem 1 The diabetes.csv dataset has historical data on individuals that eventually either developed diabetes or not. Diabetes is a condition where the body does not produce enough insulin to break down the food that you eat. Without medication, diabetes can lead to damage to cells and vital organs and eventual death. In this problem, we want to predict whether a person is at risk of becoming diabetic based on the individuals data. This information can then be used to begin preventative measures for the individual (example lifestyle change in diet and exercise). The features and label for the dataset are described below Features or independent Variables: Pregnancies: Number of times pregnant Glucose: Plasma glucose concentration, 2 hours in an oral glucose tolerance test In-class lab BloodPressure: Diastolic blood pressure (mm Hg) SkinThickness: Triceps skin fold thickness (mm) Insulin: 2-Hour serum insulin (mu U/ml) BMI: Body mass index (weight in kg/(height in m)^2) DiabetesPedigreeFunction: Diabetes pedigree function, a score based on genetic factor of a person (diabetes has a close relation to family history). Age: Age (years) Labels or Dependent Variable: Outcome: No Diabetes = 0, Diabetes=1 2. Problem 2 The cancer.csv dataset deals with cancer patients. A tumour is a set of cells that have grown in a specific part of body. Tumours can be classified as being either cancerous or non-cancerous based on various factors. Cancerous tumours continue to grow uncontrollably and spread to different parts of the body and eventually to the bloodstream. At this stage, they begin interfering with body functions that can lead to death (example heart attack from clogged arteries). The reason it is important to classify tumours correctly is because generally it is expensive and risky to try to remove all tumours. In this problem, we want to predict whether a persons tumour is cancerous in order to decide whether surgery is necessary or not. Features or Independent Variables: ID - Sample code number Clump Thickness: 1 - 10 Uniformity of Cell Size: 1 - 10 Uniformity of Cell Shape: 1 - 10 Marginal Adhesion: 1 - 10 Single Epithelial Cell Size: 1 - 10 Bare Nuclei: 1 - 10 Bland Chromatin: 1 - 10 Normal Nucleoli: 1 - 10 Mitoses: 1 - 10 Label or Dependent Variable: Class: (2 for benign, 4 for malignant) In-class lab Deliverables 1. Provide the code for the algorithm in a plain text file 2. Provide a short presentation (video) going over the algorithm text and running the code (live). The demo must be live, do not just go over a PDF document!

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Filing And Computer Database Projects

Authors: Jeffrey Stewart

2nd Edition

007822781X, 9780078227813

More Books

Students also viewed these Databases questions

Question

3 What are the differences between coaching and mentoring?

Answered: 1 week ago