Question: Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset

Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset in order to find statistically significant results. In this exercise we will focus on the idea of testing hypotheses on subsets of a single data set.

Nefaria Octopain has landed her first data science internship at an aquarium. Her primary summer project has been to design and test a new feeding regimen for the aquarium's octopus population. To test her regimen, her supervisors have allowed her to deploy her new feeding regimen to 4 targeted octopus subpopulations of 40 octopuses each, every day, for a month.

The effectiveness of the new diet is measured simply by the rate at which the food is consumed, which is simply defined to be theproportionof octopuses that eat the food (POOTEF). The aquarium's standard octopus diet has a POOTEF of0.90

0.90. Nefaria is hoping to land a permanent position at the aquarium when she graduates, so she'sreallymotivated to show her supervisors that the POOTEF of her new diet regimen is a (statistically) significant improvement over their previous diet.

The data from Nefaria's summer experiment can be found inpootef.csv. Load this dataset as a Pandas DataFrame.

[53]:

 

[53]:

GroupDateFedAte01Oct 1 2018403711NaN403721NaN403531NaN403541Oct 5 20184036

Part A: State the null and alternate hypotheses that Nefaria should test to see if her new feeding regimen is an improvement over the aquarium's standard feeding regimen with a POOTEF of0.90

0.90.

 

  

Part B: Test the hypothesis fromPart Aat the=0.05

=0.05significance level using a p-value test. Is there sufficient evidence for Nefaria to conclude that her feeding regimen is an improvement?

[67]:

 

[67]:

Group 2.50000 Fed 40.00000 Ate 36.08871 dtype: float64 

 

 

Part C: Bummer, Nefaria thinks. This is the part where she decides to resort to some questionable science. Maybe there is a reasonablesubsetof the data for which her alternative hypothesis is supported? Can she find it? Can she come up for a reasonable justification for why this subset of the data should be considered while the rest should be discarded?

Here are therules: Nefaria cannot modify the original data (e.g. by adding nonexistent feedings or bites to certain groups or days) because her boss will surely notice. Instead she needs to find a subset of the data for which her hypothesis is supported by a p-value test at the=0.05

=0.05significance levelandbe able to explain to her supervisors why her sub-selection of the data is reasonable.

In addition to your explanation of why your successful subset of the data is potentially reasonable, be sure to thoroughly explain the details of the tests that you perform and show all of your Python computation.

note: I am unable to add csv file but the sample mean is 0.9022, while population mean is 0.9. I believe that is all that's needed from the csv file.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Mathematics Questions!