Question

1 Approved Answer

Posted on Jul 04, 2024

Not sure of the question: Can you assist me? Use the link in the Jupyter Notebook activity to access your Python script. Once you have

Not sure of the question: Can you assist me?

Use the link in the Jupyter Notebook activity to access your Python script. Once you have made your calculations, complete this discussion. The script will output answers to the questions given below. You must attach your Python script output as an HTML file and respond to the questions below.

In this discussion, you will apply the statistical concepts and techniques covered in this week's reading about hypothesis testing for the difference between two population proportions. In the previous week's discussion, you studied a manufacturing process at a factory that produces ball bearings for automotive manufacturers. The factory wanted to estimate the average diameter of a particular type of ball bearing to ensure that it was being manufactured to the factory's specifications.

Recently, the factory began a new production line that is more efficient than the existing production line. However, the factory still needs ball bearings to meet the same specifications. To compare the accuracy of the new process against the existing process, the factory decides to take two random samples of ball bearings. The first sample is of 50 randomly selected ball bearings from the existing production line, and the second sample is of 50 randomly selected ball bearings produced from the new production line. For each sample, the diameters of the ball bearings were measured.

The two samples will be generated using Python's NumPy module. These data sets will be unique to you, and therefore your answers will be unique as well. Run Step 1 in the Python script to generate your unique sample data.

Suppose that the factory claims that the proportion of ball bearings with diameter values less than 2.20 cm in the existing manufacturing process is the same as the proportion in the new process. At alpha=0.05, is there enough evidence that the two proportions are the same? Perform a hypothesis test for the difference between two population proportions to test this claim.

In your initial post, address the following items:

Define the null and alternative hypotheses in mathematical terms as well as in words.
Identify the level of significance.
Include the test statistic and the P-value. See Step 2 in the Python script. (Note that Python methods return two-tailed P-values. You must report the correct P-value based on the alternative hypothesis.)
Provide a conclusion and interpretation of the test: Should the null hypothesis be rejected? Why or why not?

Here is a copy of my hypothesis:

Module Four Discussion: Hypothesis Testing for the Difference in Two Population Proportions

This notebook contains the step-by-step directions for your Module Four discussion. It is very important to run through the steps in order. Some steps depend on the outputs of earlier steps. Once you have completed the steps in this notebook, be sure to answer the questions about this activity in the discussion for this module.

Reminder: If you have not already reviewed the discussion prompt, please do so before beginning this activity. That will give you an idea of the questions you will need to answer with the outputs of this script.

Initial post (due Thursday)

Step 1: Generating sample data

This block of Python code will generate two samples, both of size 50, that you will use in this discussion. The datasets will be unique to you and therefore your answers will be unique as well. The numpy module in Python allows you to create a data set using a Normal distribution. The data sets will be saved in Python dataframes and will be used in later calculations.

Click the block of code below and hit theRunbutton above.

In[1]:

import pandas as pd import numpy as np # create 50 randomly chosen values from a normal distribution. (arbitrarily using mean=2.48 and standard deviation=0.500)  diameters_sample1 = np.random.normal(2.48,0.500,50) # convert the array into a dataframe with the column name "diameters" using pandas library diameters_sample1_df = pd.DataFrame(diameters_sample1, columns=['diameters']) diameters_sample1_df = diameters_sample1_df.round(2) # create 50 randomly chosen values from a normal distribution. (arbitrarily using mean=2.50 and standard deviation=0.750)  diameters_sample2 = np.random.normal(2.50,0.750,50) # convert the array into a dataframe with the column name "diameters" using pandas library diameters_sample2_df = pd.DataFrame(diameters_sample2, columns=['diameters']) diameters_sample2_df = diameters_sample2_df.round(2) # print the dataframe to see the first 5 observations (note that the index of dataframe starts at 0) print("Diameters data frame of the first sample (showing only the first five observations)") print(diameters_sample1_df.head()) print() print("Diameters data frame of the second sample (showing only the first five observations)") print(diameters_sample2_df.head()) Diameters data frame of the first sample (showing only the first five observations) diameters 0 2.19 1 2.45 2 2.40 3 1.18 4 2.40 Diameters data frame of the second sample (showing only the first five observations) diameters 0 2.62 1 2.63 2 2.34 3 2.03 4 2.90

Step 2: Performing hypothesis test for the difference in population proportions

The z-test for proportions can be used to test for the difference in proportions. Theproportions_ztestmethod in statsmodels.stats.proportion submodule runs this test. The input to this method is a list of counts meeting a certain condition (given in the problem statement) and a list of sample sizes for the two samples.

CountsPython list that is assigned the number of observations in each sample with diameter values less than 2.20.

nPython list that is assigned the total number of observations in each sample.

Click the block of code below and hit theRunbutton above.

In[2]:

from statsmodels.stats.proportion import proportions_ztest # number of observations in the first sample with diameter values less than 2.20.  count1 = len(diameters_sample1_df[diameters_sample1_df['diameters']<2.20]) # number of observations in the second sample with diameter values less than 2.20.  count2 = len(diameters_sample2_df[diameters_sample2_df['diameters']<2.20]) # counts Python list counts = [count1, count2] # number of observations in the first sample n1 = len(diameters_sample1_df) # number of observations in the second sample n2 = len(diameters_sample2_df) # n Python list n = [n1, n2] # perform the hypothesis test. output is a Python tuple that contains test_statistic and the two-sided P_value. test_statistic, p_value = proportions_ztest(counts, n) print("test-statistic =", round(test_statistic,2)) print("two tailed p-value =", round(p_value,4)) test-statistic = 2.67 two tailed p-value = 0.0077