Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

In this task you need to: . Use the pretrained model 'bert - base - uncased' for BERT encoding. . Ignore the requirement A few

In this task you need to:
.Use the pretrained model 'bert-base-uncased' for BERT encoding.
.Ignore the requirement A few of transformer decoder layers, hidden dimension 768. You need to determine how many layers to use between 1~3.
The task is given below:
Transformer
Implement a simple Transformer neural network that is composed of the following layers:
Use BERT as feature extractor for each token.
A few of transformer encoder layers, hidden dimension 768. You need to determine how many layers to use between 1~3.
A few of transformer decoder layers, hidden dimension 768. You need to determine how many layers to use between 1~3.
1 hidden layer with size 512.
The final output layer with one cell for binary classification to predict whether two inputs are related or not.
Note that each input for this model should be a concatenation of a positive pair (i.e. question + one answer) or a negative pair (i.e. question + not related sentence). The format is usually like [CLS]+ question +[SEP]+ a positive/negative sentence.
Train the model with the training data, use the dev_test set to determine a good size of the transformer layers, and report the final results using the test set. Again, remember to use the test set only after you have determined the optimal parameters of the transformer layers.
Based on your experiments, comment on whether this system is better than the systems developed in the previous tasks.
NECESSARY STEPS:
The model has the correct layers, the correct activation functions, and the correct loss function.
The code passes the sentence text to the model correctly. The documentation needs to explain how to handle length difference for a batch of data
The code returns the IDs of the n sentences that have the highest prediction score in the given question.
The notebook reports the F1 scores of the test sets and comments on the results.
For good coding and documentation in this task. In particular, the code and results must include evidence that shows your choice of best size of the transformer layers. The explanations must be clear and concise. To make this task less time-consuming, use n=1.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Deductive And Object Oriented Databases Second International Conference Dood 91 Munich Germany December 18 1991 Proceedings Lncs 566

Authors: Claude Delobel ,Michael Kifer ,Yoshifumi Masunaga

1st Edition

3540550151, 978-3540550150

More Books

Students also viewed these Databases questions