Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Please use Python to answer all the question. 1. Data Analysis - Module 3 Note this is not a Collaborative Problem 20 Points Total In
Please use Python to answer all the question.
1. Data Analysis - Module 3 Note this is not a Collaborative Problem 20 Points Total In this problem, implement code to analyze the Iris data sets by feature and plant species (class) using the test statistics listed in Table 1. In Module 3 under Content in Probability document Table 1 can be used as a reference. (a) (10 points) Perform statistics of each feature and class using the test statistics listed in Table 1. You may use built-in function for your solution. Your results should be in a table that is easy to follow and reference. (b) (10 points) Perform analysis and provide an explanation of what each of the statistics provides of the data. The analysis should reference your table from Part (a). What conclusions can you make based on these statistics? For clarification, the analysis should be done by feature followed by class (flower species). This analysis should provide insight into the Iris data set. The analysis should be put into tables for easy understanding and referencing. The Iris data set is represented by the [1504] matrix X, [14] vector x is the mean of the four features for all observations, x1 is the [1501] vector representing the sepal length, x2 is the [1501] vector representing the sepal width, x3 is the [1501] vector representing the petal length, and x4 is the [1501] vector representing the petal width. Taking the notation a step further, let x1,c represents the vector for sepal length by class (species) c=[1,2,3] (Setosa, Versicolor, Virginica), specifically, x1,1 be the [501] vector representing the sepal length for class 1 (Setosa), x1,2 be the [501] vector representing the sepal length for class 2 (Versicolor), and x1,3 be the [501] vector representing the sepal length for class 3 (Virginica). Note: The Trimmed Mean is a variation of the mean which is calculated by removing values from the beginning and end of a sorted set of data. The average is then taken using the remaining values. This allows any potential outliers to be removed when calculating the statistics of the data. Assuming the data in xs=[x1,s,x2,s,,xn,s] is sorted, the resulting xs,p=[x1+p,s,x2+p,s,,xnp,s]. the trimmed mean allows the removal of extreme values influencing the mean of the data. For the Trimmed Mean removes a % form the beginning and end of the sorted data, where represents the number of observations based on the %
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started