Question
Below are some notes made by a statistician when analysing some data. Using the statisticians notes, write a statistical report of the statisticians analysis. Your
Below are some notes made by a statistician when analysing some data. Using the statisticians notes, write a statistical report of the statisticians analysis. Your report should include the following sections.
- Summary [4]
- Introduction [4]
- Methods [4]
- Results [4]
- Discussion [4]
Remember that your report should be short and succinct. There are further marks for successfully extracting the main information from the notes and for not reporting the statisticians thought processes. [2]
The statisticians notes
Question of interest: Can the general public distinguish between extracts from books written by the nineteenth-century literary giant Charles Dickens, and extracts from books written by Edward Bulwer-Lytton (another Victorian author, considered by some to be the worst writer in history)?
The analysis is based on data given in the reference: Simkin, M. (2013) Scientific evaluation of Charles Dickens, Journal of Quantitative Linguistics, vol. 20, pp. 6873. All calculations were done in Minitab 17.
Data: 9461 people took a quiz, called Great prose or not?, in which there were 12 extracts to read: 6 extracts were taken from books written by Dickens, and 6 were taken from books written by Bulwer-Lytton. The quiz did not give the author of each extract. For each of the 12 extracts, each person taking the quiz had to select either Dickens or Bulwer-Lytton as the author of the extract. For each extract, the percentage of people who selected Dickens and the percentage who selected Bulwer-Lytton were given in the reference.
For each person taking the quiz, there are two possible outcomes for each extract in the quiz: correct author selected and incorrect author selected. Let p be the proportion of correct author identifications across all 12 extracts and 9461 people taking the quiz. If everyone taking the quiz is guessing, then p = 0.5. But, if it is generally possible to tell the difference between extracts written by Dickens and Bulwer-Lytton, then p > 0.5. So test
H0 :p=0.5, H1 :p>0.5.
For this test, we need x, the total number of times the correct author was selected by the 9461 people taking the quiz, and n, the total number of author identifications when 9461 people took the quiz.
Since there were 9461 people who took the quiz and 12 extracts in the quiz, altogether there were n = 9461 12 = 113 532 author identifications. The value of n is certainly large enough to use large-sample methods for testing the proportion p.
The reference gives the percentages, rather than the numbers, of times that the correct author was selected for each extract for the 9461 people taking the quiz. These percentages can be used to estimate the number of correct author identifications. An estimate of x, the total number of correct author identifications, is then 54 704.
So an estimate of p is 0.4818, and the value of the test statistic calculated by Minitab is 12.24 with p-value 1.000. Since the test statistic is large and negative, the data certainly dont suggest that p > 0.5, but instead it looks like there is evidence that p < 0.5!
To account for possible values of p both larger than and smaller than 0.5, instead test
H0 :p=0.5, H1 :p=0.5.
The test statistic is still 12.24, but this time the p-value is 0.000. There is therefore (very) strong evidence to reject H0 with these hypotheses, and since p < 0.5, the data suggest that the general public do even worse than guessing!!
According to the reference, extract 10 is the Dickens extract that is most like a Bulwer-Lytton extract, and extract 12 is the Bulwer-Lytton extract that is most like a Dickens extract. The percentages of people who correctly identified the authors of these two extracts were indeed the lowest amongst the percentages for the 12 extracts. If the data for these two extracts are excluded from the analysis, do the general public still perform even worse than guessing?
When considering just 10 extracts from the quiz, altogether there were n = 9461 10 = 94 610 author identifications.
The estimate of x for all 12 extracts is 54 704. The estimate of the number of times that extract 10 was correctly identified is 3841, while the estimate of the number of times that extract 12 was correctly identified is 2431. So for the quiz excluding extracts 10 and 12, x, the total number of times the correct author is selected, is 54 704 3841 2431 = 48 432.
This time an estimate of p is 0.5119, which is greater than 0.5. Once again, test the hypotheses
H0 :p=0.5, H1 :p=0.5.
When extracts 10 and 12 are excluded, the test statistic is 7.33 with p-value 0.00. Yet again there is strong evidence to reject H0. However, this time, since p > 0.5, the data suggest that the general public perform better than guessing.
It seems that the question of whether the general public can select the correct author between Dickens and Bulwer-Lytton may depend to a certain extent on which extracts are being considered.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started