We are asked to complete a hypothesis test for 2 sample sets of ball bearings between 2
Question:
We are asked to complete a hypothesis test for 2 sample sets of ball bearings between 2 different manufacturing production methods to determine the difference between the population proportions of their diameters using a preset python script. The sample size of each is 50, and the claim is that there is no difference between the 2 processes that will produce a diamter less than 2.20 cm..
My python script produced a test statistic of 1.1 and a two-tailed p-value of 0.2705
The instructer has asked an additional question for us to determine which process is better - however based on the info I am not sure how to answer? Below is the info provided within the python codes.
Step 1: Generating sample data
In[1]:
import pandas as pd import numpy as np # create 50 randomly chosen values from a normal distribution. (arbitrarily using mean=2.48 and standard deviation=0.500) diameters_sample1 = np.random.normal(2.48,0.500,50) # convert the array into a dataframe with the column name "diameters" using pandas library diameters_sample1_df = pd.DataFrame(diameters_sample1, columns=['diameters']) diameters_sample1_df = diameters_sample1_df.round(2) # create 50 randomly chosen values from a normal distribution. (arbitrarily using mean=2.50 and standard deviation=0.750) diameters_sample2 = np.random.normal(2.50,0.750,50) # convert the array into a dataframe with the column name "diameters" using pandas library diameters_sample2_df = pd.DataFrame(diameters_sample2, columns=['diameters']) diameters_sample2_df = diameters_sample2_df.round(2) # print the dataframe to see the first 5 observations (note that the index of dataframe starts at 0) print("Diameters data frame of the first sample (showing only the first five observations)") print(diameters_sample1_df.head()) print() print("Diameters data frame of the second sample (showing only the first five observations)") print(diameters_sample2_df.head())
Diameters data frame of the first sample (showing only the first five observations) diameters 0 2.17 1 2.65 2 2.62 3 2.66 4 2.67 Diameters data frame of the second sample (showing only the first five observations) diameters 0 2.51 1 2.84 2 3.59 3 2.65 4 2.50
Step 2: Performing hypothesis test for the difference in population proportions
The z-test for proportions can be used to test for the difference in proportions. The proportions_ztest method in statsmodels.stats.proportion submodule runs this test. The input to this method is a list of counts meeting a certain condition (given in the problem statement) and a list of sample sizes for the two samples.
Counts Python list that is assigned the number of observations in each sample with diameter values less than 2.20. n Python list that is assigned the total number of observations in each sample.
Click the block of code below and hit the Run button above.
In[2]:
from statsmodels.stats.proportion import proportions_ztest # number of observations in the first sample with diameter values less than 2.20. count1 = len(diameters_sample1_df[diameters_sample1_df['diameters']<2.20]) # number of observations in the second sample with diameter values less than 2.20. count2 = len(diameters_sample2_df[diameters_sample2_df['diameters']<2.20]) # counts Python list counts = [count1, count2] # number of observations in the first sample n1 = len(diameters_sample1_df) # number of observations in the second sample n2 = len(diameters_sample2_df) # n Python list n = [n1, n2] # perform the hypothesis test. output is a Python tuple that contains test_statistic and the two-sided P_value. test_statistic, p_value = proportions_ztest(counts, n) print("test-statistic =", round(test_statistic,2)) print("two tailed p-value =", round(p_value,4))
test-statistic = 1.1 two tailed p-value = 0.2705