Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 07, 2024

Please provide solution and explanation to the following python exercise. Thank you! 4. [25 pts] Often, there is data missing in many biomedical datasets (e.g.

Please provide solution and explanation to the following python exercise. Thank you!

image text in transcribed

image text in transcribed

4. [25 pts] Often, there is data missing in many biomedical datasets (e.g. electronic health records, epidemiology, etc.). For time-series data with missing values, one solution is to use the existing data to fill in the missing time point values. In this problem, we will use a time-series dataset to explore different data imputation techniques. a. Upload the data from the zipped dataset file. Plot the original data (line plot) and describe any patterns that are visible in the time series data. Hint: using Python, please convert the 'Month' column into pd.DatetimeIndex b. In this dataset, x-axis will be the time axis (i.e. date) as the variable, and y-axis will be the number of passengers traveling as the observation that is a function of the date variable. Please use Linear Regression to model the relationship between the date (the original 'Month' column) and '\#Passengers'. Plot the original data and the line of best fit in the same figure (Hint: sklearn.linear model). Calculate the R-squared using the sklearn package. What is the equation of the final linear model (including calculated values for model parameters)? Calculate the significance (p-value) of the model variables using the statsmodels package (Hint: statsmodels.org). c. Plot the 12-month rolling variance across time (Hint: Rolling.var). What assumptions are made when calculating the significance of features during ordinary least squares (OLS) linear regression? Does this data fulfill these assumptions? How does this affect the interpretation of these results? d. Generate a copy of the original dataset with 100 randomly missing timepoints values (Hint: set '\#Passengers' values to np.nan without removing the entire row). Plot the new dataset with missing values alongside a plot of the original dataset (two line or scatter plots). Make the missing values obvious from your plot to obtain full credit. e. Use three different interpolation techniques to fill in the missing values (Hint: pandas.DataFrame.interpolate). Use Linear interpolation, Nearest Neighbor interpolation, and Spline (order 3). Plot the original dataset, the missing dataset, and the three interpolated datasets (within a single figure). Compare and contrast the techniques. What features of the data described in 2.a are missing in the interpolated data if there are any. f. Calculate the correlation of imputed datasets with the original dataset (Hint: . Which approach performed the best/worst? For each imputation method, please explain an example situation where they would be the most appropriate (1-2 sentences for each imputation method)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Modern Database Management

Modern Database Management

Authors: Jeffrey A. Hoffer Fred R. McFadden

9th Edition

B01JXPZ7AK, 9780805360479

More Books

Students also viewed these Databases questions

Question

★★★★★

Cable ABCDE supports three loads as shown. Determine (a) The distance dC for which portion CD of the cable is horizontal, (b) The corresponding reactions at the supports. 2 ft 2ft 21 de de 2.4 t edp...

Answered: 1 week ago

Question

★★★★★

Dura-Conduit Corporation manufactures plastic conduit that is used in the cable industry. A conduit is a tube that encircles and protects the underground cable. In the process for making the plastic...

Answered: 1 week ago

Question

★★★★★

=+ (b) Suppose that 2 F and that } is closed under the formation of comple- ments and finite disjoint unions. Show that F need not be a field.

Answered: 1 week ago

Question

★★★★★

Change in Estimate Assume that Bloomer Company purchased a new machine on January 1, 2010, for $80,000. The machine has an estimated useful life of nine years and a residual value of $8,000. Bloomer...

Answered: 1 week ago

Question

★★★★★

Any 3 companies or 1 company that has strategic goals, tactical objectives, and operational objectives. This week's safari is a little different. READ CAREFULLY!!! Find me an example of the three...

Answered: 1 week ago

Question

★★★★★

As George Harman of Centralia Construction Corporation (CCC) reviewed the drawings and cost estimates for the Myerson Industries plant expansion, he wondered how aggressively he should bid on the...

Answered: 1 week ago

Question

★★★★★

Read and Answer questions below The Invisible Hand Can Park Your Car In many cities, finding an available parking spot on the street seems about as likely as winning the lottery. But if local...

Answered: 1 week ago

Question

★★★★★

Question 3 The temperature in C over a flat surface is described by the function T(x, y) = xy+ 2x + y. (a) Determine all local stationary points of the temperature function and classify them....

Answered: 1 week ago

Question

★★★★★

Use Simpson's Rule with n = 10 to estimate the arc length of the curve. (Round your answer to four decimal places.) y = ex 0 x 6 Find the answer produced by a calculator or computer to compare with...

Answered: 1 week ago

Question

★★★★★

RC Ice Cream produces three products and allocates overhead cost using a plantwide rate of $2 per direct labor hour. Strawberry and Vanilla uses the facilities of Dept SV while Chocolate uses Dept C....

Answered: 1 week ago

Question

★★★★★

What are the equivalent units of production for materials? - What are the equivalent units of production for labor and overhead? - What is the total material unit cost? - What is the total conversion...

Answered: 1 week ago

Question

★★★★★

Case Study: Property Purchase Strategy Glenn Foreman, president of Oceanview Development Corporation, is considering submitting a bid to purchase property that will be sold by sealed bid at a county...

Answered: 1 week ago

Question

★★★★★

(Appendices) Many employers have both group short-term and long-term disability-income plans. Compare (1) short-term plans with (2) long-term plans with respect to each of the following: LO8 a....

Answered: 1 week ago

Question

★★★★★

(Appendices) Brandon, age 32, is married and has a son, age one. Six months ago, Brandon purchased an individual health insurance policy covering the entire family. His son was recently diagnosed...

Answered: 1 week ago

Question

★★★★★

(Appendices) Ken, age 52, works only part-time and has no health insurance. The cartilage in both his knees is severely eroded from osteoarthritis, which causes severe pain during his daily...

Answered: 1 week ago

Previous Question Next Question