Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset

Data Dredgingandp-hackingare umbrella terms for the dangerous practice of automatically testing a large number of hypotheses on the entirety or subsets of a single dataset in order to find statistically significant results. In this exercise we will focus on the idea of testing hypotheses on subsets of a single data set.

Nefaria Octopain has landed her first data science internship at an aquarium. Her primary summer project has been to design and test a new feeding regimen for the aquarium's octopus population. To test her regimen, her supervisors have allowed her to deploy her new feeding regimen to 4 targeted octopus subpopulations of 40 octopuses each, every day, for a month.

The effectiveness of the new diet is measured simply by the rate at which the food is consumed, which is simply defined to be theproportionof octopuses that eat the food (POOTEF). The aquarium's standard octopus diet has a POOTEF of0.90

0.90. Nefaria is hoping to land a permanent position at the aquarium when she graduates, so she'sreallymotivated to show her supervisors that the POOTEF of her new diet regimen is a (statistically) significant improvement over their previous diet.

The data from Nefaria's summer experiment can be found inpootef.csv. Load this dataset as a Pandas DataFrame.

[53]:

df = pd.read_csv('pootef.csv') df.head() 

[53]:

GroupDateFedAte01Oct 1 2018403711NaN403721NaN403531NaN403541Oct 5 20184036

Part A: State the null and alternate hypotheses that Nefaria should test to see if her new feeding regimen is an improvement over the aquarium's standard feeding regimen with a POOTEF of0.90

0.90.

 

Part B: Test the hypothesis fromPart Aat the=0.05

=0.05significance level using a p-value test. Is there sufficient evidence for Nefaria to conclude that her feeding regimen is an improvement?

[60]:

 

[60]:

-2.3263478740408408 

 

 

Part C: Bummer, Nefaria thinks. This is the part where she decides to resort to some questionable science. Maybe there is a reasonablesubsetof the data for which her alternative hypothesis is supported? Can she find it? Can she come up for a reasonable justification for why this subset of the data should be considered while the rest should be discarded?

Here are therules: Nefaria cannot modify the original data (e.g. by adding nonexistent feedings or bites to certain groups or days) because her boss will surely notice. Instead she needs to find a subset of the data for which her hypothesis is supported by a p-value test at the=0.05

=0.05significance levelandbe able to explain to her supervisors why her sub-selection of the data is reasonable.

In addition to your explanation of why your successful subset of the data is potentially reasonable, be sure to thoroughly explain the details of the tests that you perform and show all of your Python computation.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Elementary Algebra

Authors: Charles P McKeague

2nd Edition

1483263819, 9781483263816

More Books

Students also viewed these Mathematics questions

Question

Example. Evaluate 5n+7 lim 7-00 3n-5

Answered: 1 week ago