Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please help me with the below questions to answer: 1. You're creating a well-normalized log-binned histogram. You chose your first bin to be [1, 3).

Please help me with the below questions to answer:

1. You're creating a well-normalized log-binned histogram. You chose your first bin to be [1, 3). In this first bin, you counted 10 data points. For the second bin, you counted 120 data points.

What is the ratio (B/A) between the height of the bar for the first bin (A) and the height of the bar for the second bin (B)?

a. 4.0

b. 40/3.333

c. 1/12

d. 12

2. CDF function F(x)is defined by F(x)=P(Xx)(empirically, the fraction of data points that are smaller than the given value). Which of the following is NOT the property of the CDF?

a. F(x) is a monotonically increasing function.

b. The CDF lets us figure out percentile points.

c. For any dataset, as x increases, F(x) approaches 1.

d. The empirical CDF cannot be defined if we have too few data points.

3. You are using the Kernel Density Estimation method with a rectangular kernel to estimate the underlying distribution of your data (X = [5, 1, 3, 1, 2, 3, 5, 5, 6, 5]). The width of the rectangular kernel is 1.2. If we examine the resulting distribution, what would be the area under the distribution/curve (or the estimated probability mass) that is within the range [0, 4]?

What would be the value?

4. You received a dataset about the number of items sold for each of your inventories. It's essentially a sorted list of numbers like this: X = [1, 1, 1, 2, 2, 5, 10, 14, 101, 252]. You decided to use empirical CDF to visualize the data distribution. You normalize your CDF so that the CDF accumulates to 1.0 (the largest value becomes 1.0), then what would be the value of this CDF at x = 100?

What would be the value?

5. When you obtained a 2D plot from tSNE or UMAP, you can interpret the coordinate of each data point as a linear combination of the original (high-dimensional) features. - True or False

6. Two distinct data distributions can lead to the exactly same box plots. - True or false

7. In log-scale, the apparent distance between 1 and 100 is same as the distance between 50 and

Group of answer choices

a. 500

b. 0.5

c. 50000

d. 149

8. In KDE, we put a "kernel" onto each data point. Imagine a rectangular kernel with (band)width of 0.5. If you have 10 data points, what would be the height of each kernel that we are adding to obtain the KDE?

a. 0.1

b. 0.2

c. 2

d. 1

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Real Analysis

Authors: N L Carothers

1st Edition

1139632434, 9781139632430

More Books

Students also viewed these Mathematics questions

Question

Why are financial intermediaries important to the financial system?

Answered: 1 week ago