Question
Define build_ds(file) in python that reads the file and builds the datastructure which is needed for further processing. The datastructure returned should be a 2D
Define build_ds(file) in python that reads the file and builds the datastructure which is needed for further processing. The datastructure returned should be a 2D matrix (list of lists), where each sentence of the text is a row of the matrix, and each word of the line - an element of the row. Any text that is not composed of letters-only (e.g. numbers, punctuations marks) should be excluded (use the function with re module which takes a string and returns a boolean - whether it consists of letters only (both lower and upper case).). The only sentence border is the newline.
sample.txt:
human language is filled with ambiguities .
usage exceptions , variations in sentence structurethese just a few of the irregularities of human language that take humans years to learn .
but that programmers must teach natural language-driven applications to recognize and understand accurately from the start, if those applications are going to be useful.
The output should like this:
ds = build_ds (" sample .txt ")
ds [:2]
Out [1]: [[human , language , is , filled , with , ambiguities ], [ usage , exceptions , variations , in , sentence , 'just ', 'a ', ... , to ' , learn ]]
#Task
def build_ds(file): """ Read a file. Build a datastructure - 2D matrix (list of lists). Rows should be sentences, cells should contain the words. :param file: a txt file where the text is all lowercase and sentence border punctuation marks have spaces around them :rtype: 2D list """
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started