Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Task 1 : Parallel Corpora Parallel corpora contain a collection of texts in a given language and their translation to one or more other languages.

Task 1: Parallel Corpora
Parallel corpora contain a collection of texts in a given language and their translation to one or more other
languages. In this task, you will build a small parallel corpus using data from OpenSubtitles.org, a
database that allows you to search and download subtitles for various languages. It was previously used to
build the OpenSubtitles corpus, which consists of around 2.6 billion sentences and covers 60 languages.
Search for the film Monty Python and the Holy Grail (1975) on OpenSubtitles.org and download subtitles
for English, German, and a third language of your choosing. Open the files using a text editor (e.g. VS
Code) and familiarise yourself with the format. Your corpus will include sentences from a famous scene
that starts at 00:17:48(first English sentence is : Quiet! There are ways of telling whether she is a witch.),
and ends at 00:20:31(last English sentence is: ...knight of the Round Table.). Your goal is to clean up the
data, match subtitles in different languages and put the lines together, transforming them into the following
format:
line 1 in English
line 1 in German
line 1 in chosen language
line 2 in English
line 2 in German
line 2 in chosen language
...
You will see that this manual process is not feasible for greater amounts of data, and you will learn how to
automate a process like this later on in the course.
Save the created corpus as grail_corpus.txt and submit the file together with the assignment.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals, Design, and Implementation

Authors: David M. Kroenke, David J. Auer

14th edition

133876705, 9781292107639, 1292107634, 978-0133876703

Students also viewed these Databases questions

Question

Why is it so difficult to implement effective IMC?

Answered: 1 week ago