Question: Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset

Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset in order to find statistically significant results. In this exercise we will focus on the idea of testing hypotheses on subsets of a single data set.

Nefaria Octopain has landed her first data science internship at an aquarium. Her primary summer project has been to design and test a new feeding regimen for the aquarium's octopus population. To test her regimen, her supervisors have allowed her to deploy her new feeding regimen to 4 targeted octopus subpopulations of 40 octopuses each, every day, for a month.

The effectiveness of the new diet is measured simply by the rate at which the food is consumed, which is simply defined to be theproportionof octopuses that eat the food (POOTEF). The aquarium's standard octopus diet has a POOTEF of0.90

0.90. Nefaria is hoping to land a permanent position at the aquarium when she graduates, so she'sreallymotivated to show her supervisors that the POOTEF of her new diet regimen is a (statistically) significant improvement over their previous diet.

The data from Nefaria's summer experiment can be found inpootef.csv. Load this dataset as a Pandas DataFrame.

[53]:

GroupDateFedAte01Oct 1 2018403711NaN403721NaN403531NaN403541Oct 5 20184036

Part A: State the null and alternate hypotheses that Nefaria should test to see if her new feeding regimen is an improvement over the aquarium's standard feeding regimen with a POOTEF of0.90

0.90.

Part B: Test the hypothesis fromPart Aat the=0.05

=0.05significance level using a p-value test. Is there sufficient evidence for Nefaria to conclude that her feeding regimen is an improvement?

[67]:

Group 2.50000 Fed 40.00000 Ate 36.08871 dtype: float64

Part C: Bummer, Nefaria thinks. This is the part where she decides to resort to some questionable science. Maybe there is a reasonablesubsetof the data for which her alternative hypothesis is supported? Can she find it? Can she come up for a reasonable justification for why this subset of the data should be considered while the rest should be discarded?

Here are therules: Nefaria cannot modify the original data (e.g. by adding nonexistent feedings or bites to certain groups or days) because her boss will surely notice. Instead she needs to find a subset of the data for which her hypothesis is supported by a p-value test at the=0.05

=0.05significance levelandbe able to explain to her supervisors why her sub-selection of the data is reasonable.

In addition to your explanation of why your successful subset of the data is potentially reasonable, be sure to thoroughly explain the details of the tests that you perform and show all of your Python computation.

note: I am unable to add csv file but the sample mean is 0.9022, while population mean is 0.9. I believe that is all that's needed from the csv file.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!

Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset in order to find...

1 2 3 4 7 8 9 12 13 14 15 16 17 18 19 20 21 22 23 24 28 29 30 31 38 40 41 44 47 48 49 50 51 62 63 64 66 67 68 69 70 71 73 74 76 77 78 79 80 81 82 85 86 87 88 89 90 91 92 93 94 95 99 100 101 104 105...

(JAVA - DATA STRUCTURES) Hi, THIS IS THE FOURTH TIME I HAVE POSTED THIS QUESTION AND NOBODY WANTS TO HELP ME. PLEASE, I NEED SOMEONE TO HELP ME. I need help with the program CountryDisplayer.java and...

JPMA-01726; No of Pages 12 Available online at www.sciencedirect.com ScienceDirect International Journal of Project Management xx (2015) xxx - xxx www.elsevier.com/locate/ijproman Does Agile work? A...

Questions and resources are attached in the document. Assignment is regarding about Financial Accounting Theory. ACCT1080 Financial Accounting Theory Topic Allocations Topic of investigation...

MGT411 Innovative and Creative Business Thinking University of Phoenix Material Organizational Ecosystem Case Study Wal-Mart Stores, Inc. is a leading company in its industry and a widely recognized...

# ( Health Care Information Systems: A Practical Approach for Health Care Management, 3rd Edition PREV NEXT Chapter 17: Asses... ' " Appendixes CHAPTER 18 Health IT Leadership A Compendium of Case...

SUMMARY this article for about 2 pages with single spacing. Thank you. ECONOMETRICS: MEASUREMENT I N ECONOMICS1 I As the first incumbent of a new Chair in this University, I thought that in this...

1 For this task, imagine that you were asked to present to a class of master's level students who are enrolled in their first quantitative research methods course. Create a PowerPoint presentation...

Create charts to better understand data sets. For cross-sectional data, use a scatter chart. For time series data, use a line chart. Linear y = a + bx Logarithmic y = ln(x) Polynomial (2nd order) y =...

Identify how SWOT analysis helps to shape the strategic direction of a company and identify which troublesome strategies to avoid or use with caution while formulating long term strategy of a with an...

Determine the two z-scores that divide the area under the standard normal curve into a middle 0.90 area and two outside 0.05 areas.

Edwin Parts, a job shop, recorded the following transactions in May: Purchased $ 8 7 , 2 6 0 in materials on account. Issued $ 3 , 6 8 0 in supplies from the materials inventory to the production...

1- Timothy is 21 year old. he lives with his father and mother, Sam and diane. timothy is a full time student at Utopia University,. However he also works some night and weekend hours at the local...

9-18 What are the sources of data for analytical CRM systems? Provide three examples of outputs from analytical CRM systems.

3. Can Pandora succeed with its freemium model? Why or why not? What people, organization, and technology factors affect its success with this business model? Pandora is the Internets most successful...

2. What managment, organization, and technology issues did Orbitz need to address in its mobile strategy? When it comes to mobile apps and gauging their impact on consumers and business, the online...