Question

1 Approved Answer

Posted on Jul 15, 2024

Assignment for Business statistics 1 The assignment consists of several questions to be solved with SPSS. Only questions 3-7 needs to be answered in the

Assignment for Business statistics 1 The assignment consists of several questions to be solved with SPSS. Only questions 3-7 needs to be answered in the report. Also, when writing the report, try to imagine that you are writing for someone with the same knowledge in statistics as a student taking Business Statistics 1. It is also important that the Instruction for lab reports in statistics is followed (available on Canvas). The deadline is on the 22th of October. The lab reports should be uploaded on Canvas on the Assignments page. A movie enthusiast has collected information on movies from four streaming platforms; Netflix, Hulu, PrimeVideo and Disney+, in a dataset called Movies.sav. 1. Open the data in SPSS To open the dataset in SPSS, simply double click on the file or follow the instructions below. 1. Start SPSS from the start menu. 2. Then use File Open Data to open the Movies.sav file. 3. Save the file on g: (suggestively). 2. Labelling variables and values. The name of the variables for the release year, IMDB-score and whether the movie is available on Disney+ has not yet been specified and are just called var001, var002, var003. To specify proper names for these variables, click on the Variable View cell in the left lower corner. To rename the variable var001, click in the var001 cell and write e.g. Year. Now rename var002 and var003 to e.g. IMDB and Disney, respectively. The variable named Age is the age recommendation of the movie. It includes 5 different age recommendations, which can be seen in the table below. Numeric code Age recommendation 1 all 2 7+ 3 13+ 4 16+ 5 18+ At this moment only the numeric codes are available in the data. To facilitate the understanding of our data we can use so called value label to indicate that the numeric value 1 means that the movie is suitable for all ages, that the numeric value of 2 means that the movie has an age recommendation of 7+, etc. To do this click the Values cell for the variable age. In the box that appears on the screen you should write 1 in the value window and all in the value label window, then click add. Similarly, give the label 7+ to the value 2, 13+ to the value 3, 16+ to the value 4, and 18+ to the value 5. 3. Graphical analysis. The first step in a data analysis is usually to perform a graphical analysis of the data. The graphical functions are available in the Graphs menu under Legacy Dialog. First, we will investigate the proportion of movies available on Netflix in the dataset by using a Pie chart. Select Pie and then in the box that pops up summaries for groups of cases. Finally define slices by Netflix. The graph is a visual presentation of the Netflix variable. After the graph is generated, double click on the graph to enter the Chart Editor. Go to Elements Show Data Labels and display percentage. Next, we wish to check the distribution of Age for the movies available on Netflix. Bar plots are useful for this purpose. From the graph menu, choose Legacy Dialogs and then Bar. In the box that appears on the screen you chose Clustered and Summarizes for groups of cases. Use Age on the category axis and define clusters by Netflix. Copy the graphs into a Word document by selecting copy from the edit menu in the output window. Paste the graphs into Word document by the paste function. Questions: a. What is the proportion of movies available on Netflix in the dataset? b. What is the mode for the Netflix variable? c. Is the mode a good measure of central tendency for the variable "Netflix"? d. What is the mode of the Age variable? Is the mode the same for movies available and movies non-available on Netflix in the dataset? When answering the questions, a and d, you should refer to the diagrams. 4. Measures of location and spread. The graphs do not provide all relevant measures. We would like to have several different measures of location and spread, for comparison. Use Analyze Descriptive Statistics Frequencies. Choose the following variables: Age, Disney and Runtime. Use the Statistics button to calculate the Mean, Median, Mode, Variance, Standard Deviation, Min, Max, and Range value for the three variables. As we can see from the output, frequency tables are a good way to display the distribution of variables with few outcomes (often used for categorical variables), but when you have a variable with many outcomes, like Runtime, the frequency tables are too messy and long to be useful. Instead, the distributions of such variables are often displayed in histograms. To remove the frequency table for Runtime simply mark the table (click on it in the output window) and press delete. To create a histogram for the variable Runtime, go to Graph Legacy dialogs Histogram and select Runtime. Questions: Include the descriptive table, the frequency tables for Age and Disney, and the histogram for the variable Runtime in your report and answer the following questions. a. At what measurement of scale (i.e. ratio, interval, ordinal or nominal scale) are the variables measured? Discuss the three variables; Age, Disney and Runtime. b. Are all three measures of central tendency (mode, median & mean) relevant for all three variables? Discuss each variable (Age, Disney and Runtime) and each measure of tendency: for instance: Age: mean can/cannot be used here because...median can/cannot be used because... c. Are the measures of variability (variance, standard deviation & range) relevant for all three variables? Discuss each variable (Age, Disney and Runtime) and each measure of variability. d. Are there missing data? If so, how many, and for which variables? 5. Confidence interval and hypothesis test. The movie enthusiast likes older movies and believes that movies made before the year 2000 is better than movies produced in year 2000 or later. Conduct a T-test to see whether the IMDB-score supports the enthusiasts believe. To do this, we need to create a new variable indicating whether the movie was released before 2000 or not. Go to Transform Compute variable. In the cell Target Variable you specify the name of the variable we create, suggestively Pre2000, in the cell Numeric Expression we specify what the variable should be equal to. In the first step we specify the variable to be equal to 1 and press OK. Now we have created a variable that is equal to 1 for all observations. However, we want the variable to be equal to 1 if the movie was released before 2000 and 0 otherwise. To make this change go to Transform Compute variable again. Let Target Variable be the name of the variable you created (Pre2000 if you followed the suggestion). In the Numeric Expression cell write 0 this time and press the If button. Click the Include if cases button and write Year>1999. This will ensure that we change the variable to be equal to 0 only for the movies released in the year 2000 or later. Press Continue, OK and OK. To conduct an independent samples T-test, use Analyze Compare means Independent samples T test. Test variable should be IMDB, group variable should be the variable you created above (Pre2000). Click define groups and let the movies released before 2000 be group 1 and the movies release after that to be group 2 by writing the value 1 for group 1 and the value 0 for group 2 (which group you define as 1 and 2 does not really matter). Questions: a. Conduct an independent samples T-test using a 5% significance level to decide if the IMDB-score is higher for the movies released before 2000. You should clearly state your hypotheses and use both the p-value approach and the critical value approach. Also, your conclusion should be clearly written. The Movie enthusiast is thinking of subscribing to Netflix but is only willing to do so if more than 30% of all available movies are good. The enthusiast regards a movie as good if it has an IMDB-score of at least 7. To begin the analysis, we need to create a variable indicating whether a movie has an IMDB-score of at least 7. Go to Transform Compute Variable, press the Reset button to reset clear everything from the previous computation. Now, call the variable GoodMovie and let it be equal to 0 for all observations in the first step. In the second step, go again to Transform Compute Variable and Target Variable be GoodMovie. In the Numeric Expression cell write 1 this time and press the If button. Click the Include if cases button and write IMDB>=7. Press Continue, OK and OK. The variable GoodMovie is now equal to 1 if it has an IMDB-score of at least 7 and 0 otherwise. To find the proportion of good movies on Netflix in the sample go to Analyze Descriptive Statistics Crosstabs. Define the Rows to be Netflix and the columns to be GoodMovie. From the output you can calculate the proportion of good movies on Netflix. If you want SPSS to do it for you go to Analyze Descriptive Statistics Crosstabs and click the Cells button and tick the box for Row in the percentages part of the window. This type of tables is called contingency tables. Questions: b. Calculate by hand based on the output in the contingency table a 95% confidence interval of the share of good movies on Netflix. Show your calculations and interpret the confidence interval. c. By hand, conduct a hypothesis test on the 95% confidence level (i.e. = 0.05) to test the null hypothesis that the share of good movies on Netflix is at most 30% vs. the alternative hypothesis that the share of good movies on Netflix is greater than 30%. State the hypotheses, significance level, etc. and show your calculations. Use both the pvalue- and the critical value approach. State your conclusions based on the hypothesis test. d. Based on the result would you advice the enthusiast to subscribe to Netflix? 6. Repeated hypothesis test by splitting the data. A friend tells the enthusiast that the average IMDB-score of a movie is 6.25. The enthusiast wants to test using a one sample t-test whether the average IMDB-score of each streaming platform is equal to 6.25 or not. The nominal variable, Platform, identifies which streaming platform each movie is available on. This task can be done by splitting the data based on this variable. To split the data, choose: Data Split File Select Compare groups Select Platform Click OK. Note: you will only get a line of code in the output window when you do this. Now when you do the t-test (see below) SPSS will do a t-test for each streaming platform automatically. In fact, anything you now do will be repeated for each streaming platform, this saves time if you want to repeat the same thing for multiple groups. To begin the one-sample t test, from the menus choose: Analyze Compare Means One-Sample t-test. Select IMDB as the test variable. Type 6,25 as the test value. Click Options. Type 95 as the confidence interval percentage. Click Continue. Click OK in the One-Sample t-test dialog box. The SPSS output reports (for each streaming platform) the t-statistic, the p-value (named Sig (two-tailed) in SPSS), the mean difference, 0 , the square root of the sampling variance 2/ (named std. error mean in SPSS) and the upper and lower bounds for the confidence interval. Use the output to solve the following problems for the streaming platforms individually: Questions: a. Calculate the t-statistics by hand for one streaming platform and check that SPSS provided the correct value of the t-statistics. Use the values available in the descriptive table provided in SPSS. b. Test 0 = 6.25 against 1 6.25 on the 5% level (i.e. = 0.05) by comparing the t-statistic to the critical value for all platforms. The critical values are found in the distribution table uploaded on Canvas; it is not a part of the SPSS output. c. Test 0 = 6.25 against 1 6.25 on the 10% level (i.e. = 0.1) by comparing the p-value (given in the table by SPSS) to . No calculations required. d. Shortly comment on the result for each streaming platform. What can you say? When answering the questions, you should refer to the tables. 7. ANOVA. Lastly, out of curiosity the movie enthusiast wants to know whether the average runtime of the movies is different based on the age recommendation. This can be tested using an ANOVA test (F-test). Recall that the variable Age identifies the age recommendation of the movie. Firstly, we want to produce a descriptive table of the runtime of the movies for each age recommendation. To do this we want to split the data based on the variable Age. Go to Data Split File Select Compare groups Select Age Click OK. Then, go to Analyze Descriptive Statistics Descriptives, select the variable Runtime and press the button Options and tick the box for Mean, Std. deviation, min and max. Press continue and OK. Before we perform the F-test we must ensure that the file is not split anymore by clicking Data Split File and then click "Reset" and OK. To perform the test go to Analyze Compare Means One-Way ANOVA. Now we must specify Dependent list as the variable Runtime and for Factor we choose the variable Age and then click on the OK button. An ANOVA-table will now appear in the output widow. Include both the descriptive table and the ANOVA table in your report. Questions: a. Based on the sum of squares between groups and sum of squares within groups found in the output, show how to calculate the mean square treatment (between groups) and the mean square error (within groups) and finally how to calculate the F-statistic. b. Use the output to test 0: 1 = 2 = 3 = 4 = 5 against 1:Not all ( = 1,2,3,4,5) are equal on the 1% level (i.e. = 0.01) by comparing the F-statistic to the critical value. You can use df=1000 for the denominator. State your conclusion of the test. When answering the questions, you should refer to the tables