Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The goal of this assignment is to focus on Word Embedding and Named Entity Recognition ( NER ) . You will work with a dataset

The goal of this assignment is to focus on Word Embedding and Named Entity Recognition (NER). You will work with a dataset containing sentences tagged with POS and named entity labels. The assignment is divided into three parts: understanding and pre-processing the dataset, implementing a Word2Vec model for word embedding, and designing and evaluating a sequential model for NER.
Part I - Understanding the Dataset and Preprocessing 3 marks
You will become familiar with the dataset and perform the necessary pre-processing steps. This includes loading the dataset, explore its structure, and prepare for subsequent modelling tasks.
Loading the Dataset (1 mark)
Load the provided dataset and display the first few rows.
Exploratory Data Analysis (1 mark)
Count the number of sentences.
Identify the unique POS tags and named entity tags.
Visualize the distribution of sentence lengths.
Visualize the frequency of different POS tags.
Data Preprocessing (1 mark)
Tokenize the sentences and map each word to its corresponding POS and named entity tags.
Convert the words and tags into numerical representations suitable for modelling, such as word indices and tag indices.
Part II - Word Embedding using Word2Vec 3 marks
You will implement a Word2Vec model to create word embeddings from the dataset and evaluate its effectiveness. This involves training the Word2Vec model on the dataset and visualizing the resulting word embeddings.
Word2Vec Model Implementation (1 mark)
Implement a Word2Vec model to generate word embeddings from the dataset.
Ensure your model captures the contextual relationships between words effectively.
Word Embedding Visualization (1 mark)
Visualize the word embeddings using techniques such as t-SNE or PCA.
Display how different words and their embeddings relate to each other in a lower-dimensional space.
Evaluation (1 mark)
Evaluate the quality of the word embeddings by examining their effectiveness in capturing semantic relationships.
Present your evaluation results clearly and concisely.
Part III - NER using Sequential Model 4 marks
You will design and implement a sequential model (such as RNN, LSTM, or GRU) to perform NER. You will train your model, evaluate its performance, and discuss the results.
Model Design and Training (2 marks)
Choose a sequential model and justify your choice.
Design the architecture of your model, specifying the input layer, hidden layers, and output layer.
Train the model on the training dataset, ensuring proper handling of sequences and padding.
Model Evaluation (1 mark)
Evaluate the performance of your trained model on the test dataset using metrics such as precision, recall, and F1-score for each named entity tag.
Present the evaluation results clearly and concisely.
Analysis and Discussion (1 mark)
Discuss the strengths and weaknesses of your sequential model based on the evaluation results. Answer without justification will not be awarded marks.
Suggest potential improvements or alternative approaches to enhance performance. Answer without justification will not be awarded marks.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions