Question
The cancer.csv dataset deals with cancer patients. A tumour is a set of cells that have grown in a specific part of body. Tumours can
The cancer.csv dataset deals with cancer patients. A tumour is a set of cells that have grown in a specific part of body. Tumours can be classified as being either cancerous or non-cancerous based on various factors. Cancerous tumours continue to grow uncontrollably and spread to different parts of the body and eventually to the bloodstream. At this stage, they begin interfering with body functions that can lead to death (example heart attack from clogged arteries). The reason it is important to classify tumours correctly is because generally it is expensive and risky to try to remove all tumours. In this problem, we want to predict whether a persons tumour is cancerous in order to decide whether surgery is necessary or not. Features or Independent Variables: ID - Sample code number Clump Thickness: 1 - 10 Uniformity of Cell Size: 1 - 10 Uniformity of Cell Shape: 1 - 10 Marginal Adhesion: 1 - 10 Single Epithelial Cell Size: 1 - 10 Bare Nuclei: 1 - 10 Bland Chromatin: 1 - 10 Normal Nucleoli: 1 - 10 Mitoses: 1 - 10 Label or Dependent Variable: Class: (2 for benign, 4 for malignant)
Using SparkML
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started