Question
Please help to solve this problem. For coding part please use JAVA or Python In this problem the Iris data set will be used to
Please help to solve this problem. For coding part please use JAVA or Python
In this problem the Iris data set will be used to begin understanding how to apply the algorithms in the first four modules to a well know data set. The Iris Plants Database contains 3 classes of 50 instances each, where each class refers to a type of Iris plant. Four attributes/features (in centimeters) were collected for each plant instance. A fifth attribute is provided which is the class label of the plant type. The data can be downloaded from iris.arff on the Sample Weka Data Sets webpage (https://storm.cis.fordham.edu/ gweiss/data-mining/datasets.html).
4. Outlier Removal (25 points) (a) Develop an algorithm (pseudocode) to remove in sequential order observations that are furthest from the data class mean. (b) Provide the running time and total running time of your algorithm in O-notation and T(n). (c) Implement your algorithm in your code of choice. (d) Determine if the data contains an outlier by plotting each class individually, the key is to plot two features at a time n different combinations, e.g., feature 1 vs feature 2, etc. (e) Provide an explanation of the results: i. was there any class that had obvious outliers; if so how did you determine the outlier, if not, why not? 1 ii. what was the metric used to determine separation? Explain why the metric was chosenStep by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started