Question
Please help me with the below questions to answer: 1. You're creating a well-normalized log-binned histogram. You chose your first bin to be [1, 3).
Please help me with the below questions to answer:
1. You're creating a well-normalized log-binned histogram. You chose your first bin to be [1, 3). In this first bin, you counted 10 data points. For the second bin, you counted 120 data points.
What is the ratio (B/A) between the height of the bar for the first bin (A) and the height of the bar for the second bin (B)?
a. 4.0
b. 40/3.333
c. 1/12
d. 12
2. CDF function F(x)is defined by F(x)=P(Xx)(empirically, the fraction of data points that are smaller than the given value). Which of the following is NOT the property of the CDF?
a. F(x) is a monotonically increasing function.
b. The CDF lets us figure out percentile points.
c. For any dataset, as x increases, F(x) approaches 1.
d. The empirical CDF cannot be defined if we have too few data points.
3. You are using the Kernel Density Estimation method with a rectangular kernel to estimate the underlying distribution of your data (X = [5, 1, 3, 1, 2, 3, 5, 5, 6, 5]). The width of the rectangular kernel is 1.2. If we examine the resulting distribution, what would be the area under the distribution/curve (or the estimated probability mass) that is within the range [0, 4]?
What would be the value?
4. You received a dataset about the number of items sold for each of your inventories. It's essentially a sorted list of numbers like this: X = [1, 1, 1, 2, 2, 5, 10, 14, 101, 252]. You decided to use empirical CDF to visualize the data distribution. You normalize your CDF so that the CDF accumulates to 1.0 (the largest value becomes 1.0), then what would be the value of this CDF at x = 100?
What would be the value?
5. When you obtained a 2D plot from tSNE or UMAP, you can interpret the coordinate of each data point as a linear combination of the original (high-dimensional) features. - True or False
6. Two distinct data distributions can lead to the exactly same box plots. - True or false
7. In log-scale, the apparent distance between 1 and 100 is same as the distance between 50 and
Group of answer choices
a. 500
b. 0.5
c. 50000
d. 149
8. In KDE, we put a "kernel" onto each data point. Imagine a rectangular kernel with (band)width of 0.5. If you have 10 data points, what would be the height of each kernel that we are adding to obtain the KDE?
a. 0.1
b. 0.2
c. 2
d. 1
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started