Question

1 Approved Answer

Posted on Oct 14, 2024

UNIVERSITY OF BRADFORD WORKSHOP 2 2016 Module PH3004D Learning Outcomes: By the end of this workshop you will be able to: 1) Use SPSS to

UNIVERSITY OF BRADFORD WORKSHOP 2 2016 Module PH3004D Learning Outcomes: By the end of this workshop you will be able to: 1) Use SPSS to check the normal distribution of the data assumption 2) Edit graphs produced in SPSS 3) Use SPSS to perform ANOVA test 4) Use SPSS to perform Chi-Square test In this workshop we will further investigate purity of the materials before and after shipment Go to your blackboard website log in and click Learning Material tab on the left side of page Click on \"Semester 2\" folder Click on \"Week 6 /spps files Click on \"Obdata1.sav\" file and select \"open with IBM SPSS 22....\" (Default option) Select File from the top menu bar Click on Save As Save the file in your M folder You may want descriptive data about purity before shipment for the two material groups for both Monday and Thursday shipment. You may also want to look at frequency distributions in the form of histograms and stem leaf plots within those grouping. This can be achieved by a combination of splitting the data according to day of shipment and the Explore function. First Make sure the Data View Window is open Click on Data Click on Split File Click on Compare groups Transfer Day into the Groups Based on box Click OK This will split the data by day. Now you will need to Explore the data in variable PBS Click on Analyze Click on Descriptive Statistics Click on Explore This will result in a new window opening, The Explore Window (Figure 8) Transfer PBS into the Dependent List Place material into the Factor List and make sure you Click the Both option in the Display box Figure 1 Fragment of the Display box in the Explore window Place ID in the Label Cases by box Click Statistics and make sure the Descriptives and Outliners boxes are ticked Click Continue Click on Plots button in the top right corner of Explore window and make sure the Histograms and Normality plots with tests boxes are ticked Click Continue Click OK This will result in a data output window opening in which you will find I. II. III. IV. V. VI. VII. VIII. A Case Processing Summary, Descriptives, Extreme Values, Test of Normality, Histograms, Normality Q,Q Plots, Detrend Normal Q,Q Plots Box plots i. The Case Process Summary The Case Process Summary is self-explanatory and just provides a summary of the data you have analysed. ii. Descriptives Descriptives (Figure 3) you have met before but you will note a statistic you may not have encountered before, the 5% trimmed mean. This is created by SPSS trimming the top and bottom 5% of each case. By comparing the 5% mean with the actual mean allows you to see how extreme values effect the data. Also important in the descriptive statistics is:a) Skewness - a measure of the distribution of the data. A skewness of 0 (Zero) indicates a perfect normal distribution, negative skewness indicates a clustering of data on the right hand side of a normal distribution, and a positive skewness indicates a clustering of data on the left hand side of a normal distribution. b) Kurtosis - a measure of the peakedness of the data. 0 (Zero) Kurtosis indicates a smooth distribution resembling the normal distribution, whereas a positive Kurtosis indicates that distribution is clustered around the mean, a negative Kurtosis indicates that the peakedness is more distributed through the data. Figure 2 Descriptive Output. It is important to look at the 5% trimmed mean the mean, skewness and Kurtosis So looking at the Descriptive stats for the purity before shipment (Monday) (Figure 3), the mean of 84.15 is not that dissimilar to the 5% trimmed mean of 84.11. This is supported by the Skewness. This has a value of 0.178 which indicates a slight skew of data to the left. The Kurtosis has a negative value of -0.995 and this indicates that the data peaked but not around the mean. iii. Extreme values Extreme values output list (Figure 4) as the name suggests, list the extreme values Figure 3 Extreme values table iv. Tests of Normality Tests of Normality table (Figure 5) allows you to assess the normality using the Kolmogorov-Smirnov test. This tests test the Null Hypothesis that there is no difference between the ideal distribution of data for the data's mean and the actual distribution of the data. So a Sig (p-value) of more than 0.05 indicates that the Null hypothesis is not rejected and the data might be normally distributed. A Sig of less than 0.05 indicates that the Null hypothesis is rejected and that the data is not normally distributed. Figure 4 The Tests of Normality for the materials groups If you look at your Tests of Normality table you will note that the Sig value in the Kolmogorov-Smirnov test for all materials groups is written as 0.200*. The * indicates that this statistic is accompanied by a descriptive footnote, which in this case states that \"This is a lower bound of the true estimate\". What this tells us is that the Sig value (p - value) is actually greater than (>) 0.200. Thus the Sig value is > 0.05, so the data in all the groups seems to be normally distributed. If the values were less than (<) 0.05 the Null Hypothesis would have been rejected and the data would be described as nonnormally distributed or non-parametric data. v. Histograms Histograms such as the histogram for the purity before shipment (Material 1, Monday) (Figure 6), also allow you to visually confirm the results of the Kolmogorov - Smirnov test. Figure 5 The histogram for the purity before shipment (Material 1, Monday). The mean lies to the left of the mode, indicating that the data is slightly skewed to the left. In the case of the purity before shipment (Material 1, Monday), the data has a mean of 84.15 Kg and this sits slightly to the left of the mode value. This indicates that the data is very slightly skewed to the left, but is peaked with peaks located away from the mean. Reference back to the Skewness and Kurtosis values for this data (0.178 and of -0.995) supports this view. However, the distribution looks similar to a normal distribution, which of course supports the outcome of the Kolmogorov-Smirnov test. vi. Normal Q-Q plots and Detrended Q-Q plots Normal Q-Q plots and Detrended Q-Q plots allow further visual investigation of normality. Normal Q-Q plots (Figure 5a) plots the observed value against the expected value. If normally distributed the plot should be a straight line. Figurea6 a) Normal Q-Q plot for the purity before shipment (Material 1, Monday). b) Deb trended Q-Q plot for the purity before shipment (Material 1, Monday).. In Figure 7a the data points mostly follow the straight line indicating that data is likely to be normally distributed. The Detrended Normal Q-Q plot plots the deviation of the data points from the straight line. Here clustering of data indicates skewed data as does large deviations from 0. In Figure 7b, there is little clustering of data points although some data points do deviate a little from 0. This again indicates that there is some variance in the data but that the data tends towards a normal distribution. Both these observations support the results of the Kolmogorov - Smirnov test indicating that this data is normally distributed. vii. Boxplots Boxplots Consist of central rectangle and whiskers. The central rectangle represents 50% of the data, in which the boarders of the box equal the location of the 1st and 3rd quartile and the line bisecting the box represents the location in the data of the median (or the 2nd quartile). The whiskers then extend out to the extreme \"smallest and largest\" values in that data set. If the median is positioned close to one end of the box this could indicate a non-parametric data set. Boxplots area also used to detect outliers, (data points that are positioned more than 1.5 box lengths away from the edge of the box). The outliers are indicated as small double circles accompanied by a number. Extreme outliers are marked with an asterisk. In this data set there are no outliers, but if there were, the number that accompanies them is the ID of that data point, and this allows you to check the data in Data View to insure that that data point is correct. In the case of the the purity before shipment ( Monday), the median weights are located close to the centre of the box plots and this again supports the view that the data is normally distributed. Figure 7 Boxplot for the purity before shipment (Monday) Comparison of means acquired from independently collected data Independent t-tests used to explore the difference between two different groups of a specific continuous variable. First let's see if there is a significant difference in mean purity between material 1 and 2 before shipments. Before doing this you need to un-split the data by Clicking on Data - Split - and in the menu box clicking on Analyze all cases Now you can do the test Click on Analyze Compare Means Independent Samples T-test Move the continuous variable PBS into the Test Variable box and Material into the Grouping Variable box. Click Define groups and type 1 into the Group 1 Box (Remember 1 represents Material 1subjects) and 2 into the Group 2 box (2 represents Material 2). Click on Continue and OK Figure 8 The Independent T-test Group Statistics table This will result in the output of two tables, one (Figure 9) the Group statistic table, and the second (Figure 9) the Independent samples test. Figure 9 The independent Samples Test table The group Statistics table is useful because it allows you to check the means, standard deviations and standard errors of both data sets and also allows you to see the sample sizes. This information about sample size can inform about missing data. The Independent Sample Test table allows you to look at the output of Levine's test for quality of variance, this then informs which of the t-values provided by SPSS can be used for your data. If your Levine's Sig value is greater than 0.05 then equal variance is assumed and you should use the first line in the t-table. If your Levine's SIG value is less than or equal to 0.05 then the variance around the means is unequal and you should use the second line in the T-test table. In Figure 9 the Levine's Sig Value is .468 so use the first line of the T-test table. If the Sig (2-tailed) is equal to or less than 0.05, then there is a significant difference between your groups, and if the Sig (2-tailed) is greater than 0.05 then there is no significant difference between the two groups. Our Sig (2-Tailed) is .020 so mean purity between material 1 and 2 are significantly different from one another before shipments. This example allowed you to determine if there was a significant difference in purity before shipment for both days of shipment. However, you may only want to look at one group, for example shipments on Monday. To do this you first have to SELECT CASES. Click on Data and choose Select Cases Click on the If Condition is satisfied Click on IF Transfer Day into the box and click on the = (equals) key Type in the value representing the case you want to use, in this case 1. (Where 1= Monday) Click Continue and OK Exercise 1 Perform an independent T-test a significant difference in purity between material 1 and 2 before shipments on Monday. Do the assumptions of the independent T-test seem to be met for your data? Paired T-test - used to explore the difference in a specific variable in one group at two different times. So far you have looked at differences between groups within a single variable, but ultimately you may want to check if there is a significant difference in purity before and after shipment. Before starting a statistical procedure make sure that all the previous splitting of data and selecting of data has been reversed. This is important, if you don't the results of your statistical tests may be wrong!!!! Having switched off the select and split functions you can now continue with your Paired t-Test Click Data and Select Cases Click If Transfer Material into the box and click on the = (equals) key Type in the value representing the case you want to use (2) (2 = Material 2) Click Ok and Continue Now the statistics will only be performed on the material 2. To Perform the Paired T-test on this data:- Click on Analyze Compare Means Paired-Samples T Test Transfer both PBS and PAS into the Paired Variables Box and Click OK This will result in the opening of an output window containing the following output (Figure 11). Figure 10 Output of Paired sample T-test used to compare if there is a significant difference in mean purity before and after shipment (Material 2). Of the three tabulated outputs of the Paired Sample T-test. The Paired sample statistics allows you to compare the means of the two samples, and the Paired Samples Test table allows you to determine if these two means are significantly different from one another. The significance of the data can be determined by looking at the Sig 2-tailed) value, and if above 0.05 then there is no significant difference between the two means, but if less than or equal to 0.05 then the Null hypothesis can be rejected and there is a significant difference between the two means. The Sig (2-Tailed) value is .000 (it means that Sig. is less than 0.001), which is less than 0.05 and so the mean purities are significantly different from one another. Exercise 2 Perform the paired T-Test to find if if there is a significant difference in the mean purity before and after shipment (Material 2, Thursday) Exercise 3 Perform appropriate test to determine if there are significant differences in purity between Monday and Thursday shipments before shipments and then to see if there are significant in purity between Monday and Thursday shipments after shipments. Statistical hypothesis Testing - ANOVA Analyses if the values of two or more unknown population means are likely to be different . You can use it in situations where there is a need to compare three or more samples. Anova assumes: that data in each sample are drawn from normal distributions and the population from which all the samples were drawn all have the same variance We would carry on investigation of the purity of the materials with some more data (3 Materials and 3 days) Go to your blackboard website log in and click Learning Material tab on the left side of page Click on \"Semester 2\" folder Click on \"Week 6 /spps files Click on \"Obdata2.sav\" file and select \"open with IBM SPSS 22....\" (Default option) Select File from the top menu bar Click on Save As First we will check if variance equality condition is met We would start by creating the error bar charts for our groups to estimate variability across the groups To create an error bar chart Click on Graphs in top menu Click on Chart Builder... Select the Gallery tab Select Bar from the list of chart types. Drag and drop the Simple Error Bar icon onto the canvas area. Drag and drop Purity_Diff assessment onto the y axis. Right-click Material group and select Nominal for the measurement level. Drag and drop Material group onto the x axis Click Element Properties. In the Error Bars Represent group, click Standard Error. Clink Continue and OK The error bars plotted on the graph are reasonable representation of the pooled variance as long as group sizes are equal. The most robust method is the test of homogeneity of Variances. We can perform test and check the assumption of variances homogeneity while we perform Anova. To run one-way Anova Click on Analyze Click on Comapre Means Select One Way Anova This will result in a new window opening One Way Anova Dialog box Transfer Purity_Diff into the Dependent List Treatment Material into the Factor List Click on Options this will open a new dialogue box Select Descriptive and Homogeneity of Variance test options Click on Continue to come back to One-way Anova dialogue box Click Post-hock this will open a new dialogue box Select Tukey Click on Continue to come back to One-way Anova dialogue box Click OK This will results in the following four tables: Descriptives, Test of Homogeneity of Variances, ANOVA, Multiple Comparisons Figure 12. Descriptive Table The descriptive table include descriptive statistics for Purity Difference variable stratified by Material group variable The p- value of the test of Homogeneity of Variances is more than 0.05 (Sig.=0.638) indicating that we cannot reject null Hypothesis about equality of variances of our datasets, hence we can assume equality of variances among the groups The p- value in the ANOVA ( F test) table is less than 0.001. Thus, we must reject the hypothesis that Purity differences are equal across materials groups. Now we need to learn more about these differences. The result of the post-hock test indicate what two groups within study are different. You can use Sig. Value in Multiple Comparisons table to find the groups that show difference in mean Puritty difference. For example \"Material 1\" and \"Material 2\" groups report Sig. Value (0.003) therefore there is a significant difference in mean total purity difference between those groups Chi-Square Test This test is based on frequencies alone, it can therefore be applied to any type of data including nominal/categorical (or \"limited\" ordinal data) It is the most commonly applied test for questionnaires; i.e., trying to explore the connection, if any, between two responses: (Null Hypothesis) There is no associaton between Variables A and B (Alternative hypothesis) There is an association between Variables A and B In the next example the customer satisfaction survey were performed in four laboratories and each laboratory was asked about his satisfaction level of the quality of the material. The laboratories could chose from the following five answers: Strongly negative Somewhat negative Neutral Somewhat positive Strongly positive We will use Chi-Square test to find if that laboratories satisfaction levels is similar in four different laboratories Go to your blackboard website log in and click Learning Material tab on the left side of page Click on \"Semester 2\" folder Click on \"Week 6 /spps files Click on \"lab.sav\" file and select \"open with IBM SPSS 22....\" (Default option) Select File from the top menu bar Click on Save As Save the file in your M folder Click on Analyze in the top menu Click on Descriptive Statistics Click on Crosstabs... Select Lab as the row variable. Select Service satisfaction as the column variable. Click Statistics. Select Chi-square, Contingency Coefficient, Phi and Cramer's V, Lambda, and Uncertainty coefficient. Click Continue. Click OK in the Crosstabs dialog box Lab * Service satisfaction Crosstabulation Count Service satisfaction Lab Strongly Somewhat Negative Negative Neutral Somewhat Strongly Positive Positive Total Lab1 25 20 38 30 33 146 Lab 2 26 30 34 27 19 136 Lab 3 15 20 41 33 29 138 Lab 4 27 35 44 22 34 162 93 105 157 112 115 582 Total The Lab * Service satisfaction Crosstabulation table shows the frequency of each response at each laboratory. If each lab provides a similar level of service, the pattern of responses should be similar across them. At each Laboratory, the majority of responses occur in the middle. Lab 2 appears to have fewer satisfied patients. Labl 3 appears to have fewer dissatisfied patients.. From the Lab * Service satisfaction Crosstabulation table, it's impossible to tell whether these differences are real or due to chance variation. Check the chisquare test to be sure. The chi-square test measures the discrepancy between the observed cell counts and what you would expect if the rows and columns were unrelated. The two-sided asymptotic significance of the chi-square statistic is greater than 0.10, so it's safe to say that the differences are due to chance variation, which implies that Labs satisfaction levels in all four laboratories are similar. Chi-Square Tests Asymp. Sig. (2Value df sided) 16.293a 12 .178 17.012 12 .149 Linear-by-Linear Association .084 1 .772 N of Valid Cases 582 Pearson Chi-Square Likelihood Ratio a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 21.73. Regression Analysis In the following example we would like to predict purity difference for batch of materials before and after shipment. We're going to use three predictors: Material, Day and Department Go to your blackboard website log in and click Learning Material tab on the left side of page Click on \"Semester 2\" folder Click on \"Week 6 /spps files Click on \"Obdata3.sav\" file and select \"open with IBM SPSS 22....\" (Default option) Select File from the top menu bar Click on Save As Save the file in your M folder Go to Analyze Click on Regression Click on Linear Transfer Purity_Diff into the Dependent List Place Material, Day and Department variables into the Independents box Click Statistics and select Descriptives Click Continue From the Method Drop-Out Menu select Forward Click OK The first table \"Descriptive Statistics\" present descriptives that you are already familiar with: Descriptive Statistics Mean Purity_Diff Std. Deviation N -1.7412 1.90027 68 Material type 2.1176 .83808 68 Day 2.0000 .82859 68 Department of dispatch 2.0735 .85197 68 The Descriptive commands also gives you correlation matrix, showing Pearson Correlation between variables Correlations Department of Purity_Diff Pearson Correlation Sig. (1-tailed) Day dispatch Purity_Diff 1.000 -.471 -.055 -.234 Material type -.471 1.000 .043 .134 Day -.055 .043 1.000 .000 Department of dispatch -.234 .134 .000 1.000 . .000 .328 .027 Material type .000 . .364 .138 Day .328 .364 . .500 Department of dispatch .027 .138 .500 . Purity_Diff 68 68 68 68 Material type 68 68 68 68 Day 68 68 68 68 Department of dispatch 68 68 68 68 Purity_Diff N Material type The \"Model summary\" table tells you what % of variability in the Purity_Difference is accounted for by Material, (it's a R-square). The footnote on this table tells you which variables were included in this equation (in this case Material). Model Summary Model 1 R .471a R Square .222 Adjusted R Std. Error of the Square Estimate .210 1.68881 a. Predictors: (Constant), Material type The ANOVA table gives you an F-test to determine whether the model is a good fit for the data. According to this p-value, it is The coefficients asscoated predictor. (Use the \"unstandardized coefficients,\" because the constant [beta zero] is included).Based on this table, the equation for the regression line is: y = .521 - 1.068(Material) Using this equation, given values for Material you can come up with a prediction for the \"Purity difference\" variable. Coefficientsa Standardized Unstandardized Coefficients Model 1 B (Constant) Material type Std. Error .521 .560 -1.068 .246 Coefficients Beta t -.471 Sig. .930 .356 -4.339 .000 a. Dependent Variable: Purity_Diff The \"Variables Entered/Removed\" table tells you which variables were included in the model at each step: In our case : Material Type Variables Entered/Removeda Model Variables Variables Entered Removed Method 1 Forward (Criterion: Material type . Probability-of-Fto-enter <= .050) a. Dependent Variable: Purity_Diff The \"Excluded\" table tells you which variables were excluded from the model. In our case: Day and Department of dispatch Excluded Variablesa Collinearity Model 1 Beta In t Sig. Partial Statistics Correlation Tolerance Day -.035b -.318 .752 -.039 .998 Department of dispatch -.174b -1.608 .113 -.196 .982 a. Dependent Variable: Purity_Diff b. Predictors in the Model: (Constant), Material type EXERCISE Perform regression analysis for variable Purr_Diff using backward selection method. Compare the results with the forward selection method