Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In Python AND Statistics! please answer fully and provide detail and explanations to what you did. Answer ALL parts of a question as well. ------------------------------------------------------------------------------------------------------------------------------------------------------------

In Python AND Statistics!

please answer fully and provide detail and explanations to what you did. Answer ALL parts of a question as well.

------------------------------------------------------------------------------------------------------------------------------------------------------------

Instructions The aim of this problem set is to work with and interpret hypothesis tests and t-tests. To do so appropriately, we will also need to be competent in data exploration, visualization, and transformations. In this dataset you will use a sample of AirBnB listings from Beijing and Seattle. The data is downloaded from AirBnB, http://insideairbnb.com/get-the-data.html. The sample, however, only contains two columns: city "Beijing" or "Seattle" price (in USD)

HERE IS THE DATASET TO WORK WITH: https://1drv.ms/x/s!AtfXPbdjkmO7oJoJA6QLYJSydgeCUQ?e=MyuQ9i

------------------------------------------------------------------------------------------------------------------------------------------------------------

For the remainder of this problem set, we no longer ask you to convert from log-price differences to price differences. Please be aware of this in the interpretation of the remainder of your responses!

PART 5: Canned t-test Function

Finally, we use a ready-made library: scipy.stats.ttest_ind contains ready-made t-test function. Remember: work with log price!

1. Compute t-value and the probability using ttest_ind. Note: you have to specify equal_var=False to tell the function that Beijing and Seattle price may have different variance.

2. Finally, state your conclusion: is Beijing more expensive than Seattle? Go through all of your three methods: simulations, 99% CI, t-value and python's t-test agree?

PART 6: How long time do you need to simulate to get the difference in mean log-prices between Beijing and Seattle that you actually observe in data, 0.739? If you did the previous tasks well, you noticed that simulated differences are way smaller than the actual differences, and even millions of experiments do not bring you close. But how long time do you have to run the simulations to actually get close?

1. First, time your simulations. Run 3.5 but for a larger number of repetitions, at least seven figures, and measure how long it takes on your computer. Your computer should run a least five seconds before proceeding (this will help with accuracy). Based on that figure, calculate how long it would take to run 1012 or so experiments. Hint: check out %timeit and %time magic macros

2. Second, what is the probability to receive such enormous t-values? You need to calculate your t-values yourself, they will not be on any tables. Assume we are dealing with normal distribution. (Not quite but we are close.) You have to compute the probability you get a value larger than the t value you computed. This can be done along the lines:

from scipy import stats norm = stats.norm() norm.cdf(-1.96) # close to 0.025 ## 0.024997895148220435

Except you replace 1.96 with your actual t-value. Explain: why does the example use norm.cdf(-1.96) instead of norm.cdf(1.96)?

3. How many iterations do you need? Let's go through a shortcutif probability p is small, you need roughly 3/p iterations. So if p = 0.001, you need 3000 iterations

4. Based on the timings you did above, how many years do you have to run the simulations? If one had started the computer the year your grandfather was born, would it be there now? If the first Seattle inhabitants had started it when they moved here following the melting ice, 10,000 or so years ago? If the last dinosaurs had started it 66,000,000 years ago? (But it must have been in Idaho or somewhere else, the land where Seattle is now did not exist back then.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction to Probability

Authors: Mark Daniel Ward, Ellen Gundlach

1st edition

716771098, 978-1319060893, 1319060897, 978-0716771098

More Books

Students also viewed these Mathematics questions

Question

Difference between truncate & delete

Answered: 1 week ago