Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Write the functions calc_precision(test_set) and calc_recall(test_set) which both take a string test_set which refers to a CSV file with documents for testing (not included in

Write the functions calc_precision(test_set) and calc_recall(test_set) which both take a string test_set which refers to a CSV file with documents for testing (not included in our training set), and return a dictionaries of the precision and recall for each language. We have provided a mini test set small_test.csv for you to develop your code over. The format of this file (though it is not obvious from looking at it!) is:

lang1,text1 lang2,text2 

We also provide you with an implementation of classify_doc(document, lang_counts), which takes a string document, and language counts, and returns a string representing the possible language of the document. We provide you with a small training set, which is loaded into default_lang_counts for you as per usual. This is a different training set to the other problems, made significantly smaller due to computational limitations.

Remember, for a particular language:

Precision = N(correct)/ N(Predicted)

where N(correct) is the number of documents the classifier got right, and N(Predicted) s the number predicted for that language. Note that you should calculate the precision for only those languages where classify_doc has predicted one or more of the test documents to be of that language (for languages where there are no predictions, no precision should be calculated).

For recall:

Recall = N(correct)/ N

where N is the number of documents written in that language in the test set. Recall should be calculated for all languages represented in the test set (in terms of the actual labels).

Your functions should behave as follows:

>>> p = calc_precision('small_test.csv') 
>>> p['German'] 
1.0 
>>> p['Asturian'] 
0.5 
>>> r = calc_recall('small_test.csv') 
>>> r['German'] 
0.9259259259259259 

Once you have done that, visualise the precision and recall of your classifier with plot_pnr(precision_dict, recall_dict, langs)which takes as arguments a dictionary of precision per language (precision_dict), a dictionary of recall per language (recall_dict), and a list of the languages to be plotted (langs). The function then generates a bar chart. Have a look at the code included in plotter.py.

import matplotlib.pyplot as plt import numpy as np

def plot_pnr(precision_dict, recall_dict, langs): """Takes a dictionaries precision_dict and recall_dict containing the precision and recall per language and plots them per language as a bar chart""" # extract the precisions that we want to plot precisions = [precision_dict[lang] for lang in langs] recalls = [recall_dict[lang] for lang in langs] # a nice way of arranging the positions ind = np.arange(len(langs)) # the width of the bars width = 0.35 ax = plt.axes() # Since we are plotting both the precision and the recall, we will # make two bar charts, and put them on the same axes. rects1 = ax.bar(ind, precisions, width, color='b') # Notice how you can change the colour of the rectangles! # try one of the following letters: rcmykw rects2 = ax.bar(ind + width, recalls, width, color='g')

# add some text for labels, title and axes ticks ax.set_ylabel('Precision/Recall') ax.set_title('Precision and Recall per language for classifier') ax.set_xticks(ind + width) ax.set_xticklabels(langs) # This changes the height of the plot plt.ylim(0, 1.1*max(precisions))

# Add a legend, explaining what the bars refer to. ax.legend((rects1[0], rects2[0]), ('Precision', 'Recall'))

# An extra function to label the heights of the bars def autolabel(rects): # attach some text labels for rect in rects: height = rect.get_height() ax.text(rect.get_x() + rect.get_width()/2., 1.05*height, '%.2f' % height, ha='center', va='bottom')

autolabel(rects1) autolabel(rects2) plt.show()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

=+ how might this lead to faster growth in productivity?

Answered: 1 week ago