Question
Python 3 Authorship attribution is a system which attempts to determine who wrote a given document, based on analysis of the language used and style
Python 3
"Authorship attribution" is a system which attempts to determine who wrote a given document, based on analysis of the language used and style of that document. We will break the system down into multiple parts, to make it clearer what the different moving parts are, and make it easier for you to test your system.
The first step in our authorship attribution system will be to take a document, separate it out into its component words, and construct/return a dictionary of word frequencies. As we are focused on the English language, we will assume that "words" are separated by whitespace, in the form of spaces (' '), tabs ('\t') and newline characters (' ').
We will also do something slightly unconventional in considering each "standalone" non-alphabetic character (i.e. any character other than whitespace, or upper- or lower-case alphabetic characters) to be a single word. For example, given the document 'Dynamic-typed variables, Python; really?!!', the component words, in sequence, would be 'Dynamic-typed' (noting that '-' here is not considered to be a word despite being non-alphabetic, as it is surrounded by alphabetic characters), 'variables', ',', 'Python', ';', 'really', '?', '!', '!'. Note here that, in the case of the document starting with 'Dynamic--typed', the breakdown into words would instead be 'Dynamic', '-', '-', and 'typed', as both of the hyphens neighbour a non-alphabetic letter. Note also that case should be preserved in the output (i.e. if a word is upper case in the original, it should remain in upper case).
Write a function authattr_worddict(doc) that takes a single string argument doc and returns a dictionary (dict) of words contained in doc (as defined above), with the frequency of each word as an int. Note that, as the output is a dict, the order of those words may not correspond exactly to that indicated below, and that the testing will accept any word ordering within the dictionary.
Here are some example calls to your authattr_worddict function:
>>> authattr_worddict('Dynamic-typed variables, Python; really?!!') {'Dynamic-typed': 1, 'Python': 1, 'really': 1, '!': 2, 'variables': 1, '?': 1, ',': 1, ';': 1}
>>> authattr_worddict('') {}
>>> authattr_worddict("Truly, rooly, rooly, indisputably 'tis ..... Gr00vy") {"'": 1, 'vy': 1, '.': 5, '0': 2, 'tis': 1, 'rooly': 2, 'Truly': 1, 'indisputably': 1, 'Gr': 1, ',': 3}
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started