Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Python 3 Authorship attribution is a system which attempts to determine who wrote a given document, based on analysis of the language used and style

Python 3

"Authorship attribution" is a system which attempts to determine who wrote a given document, based on analysis of the language used and style of that document. We will break the system down into multiple parts, to make it clearer what the different moving parts are, and make it easier for you to test your system.

The first step in our authorship attribution system will be to take a document, separate it out into its component words, and construct/return a dictionary of word frequencies. As we are focused on the English language, we will assume that "words" are separated by whitespace, in the form of spaces (' '), tabs ('\t') and newline characters (' ').

We will also do something slightly unconventional in considering each "standalone" non-alphabetic character (i.e. any character other than whitespace, or upper- or lower-case alphabetic characters) to be a single word. For example, given the document 'Dynamic-typed variables, Python; really?!!', the component words, in sequence, would be 'Dynamic-typed' (noting that '-' here is not considered to be a word despite being non-alphabetic, as it is surrounded by alphabetic characters), 'variables', ',', 'Python', ';', 'really', '?', '!', '!'. Note here that, in the case of the document starting with 'Dynamic--typed', the breakdown into words would instead be 'Dynamic', '-', '-', and 'typed', as both of the hyphens neighbour a non-alphabetic letter. Note also that case should be preserved in the output (i.e. if a word is upper case in the original, it should remain in upper case).

Write a function authattr_worddict(doc) that takes a single string argument doc and returns a dictionary (dict) of words contained in doc (as defined above), with the frequency of each word as an int. Note that, as the output is a dict, the order of those words may not correspond exactly to that indicated below, and that the testing will accept any word ordering within the dictionary.

Here are some example calls to your authattr_worddict function:

>>> authattr_worddict('Dynamic-typed variables, Python; really?!!') {'Dynamic-typed': 1, 'Python': 1, 'really': 1, '!': 2, 'variables': 1, '?': 1, ',': 1, ';': 1}

>>> authattr_worddict('') {}

>>> authattr_worddict("Truly, rooly, rooly, indisputably 'tis ..... Gr00vy") {"'": 1, 'vy': 1, '.': 5, '0': 2, 'tis': 1, 'rooly': 2, 'Truly': 1, 'indisputably': 1, 'Gr': 1, ',': 3}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Semantics Of A Networked World Semantics For Grid Databases First International Ifip Conference Icsnw 2004 Paris France June 2004 Revised Selected Papers Lncs 3226

Authors: Mokrane Bouzeghoub ,Carole Goble ,Vipul Kashyap ,Stefano Spaccapietra

2004 Edition

3540236090, 978-3540236092

More Books

Students also viewed these Databases questions

Question

\(\left(\frac{3 b}{5} ight)^{4}\) Simplify the expression.

Answered: 1 week ago