Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 25, 2024

Java Implementation Introduction This assignment is designed to provide you with a practical understanding of the task of writing an assembler You will write an

Java Implementation

Introduction

This assignment is designed to provide you with a practical understanding of the task of writing an assembler

You will write an assembler for a low-level language called Shack (Simplified Hack). Shack is a language designed by me for this particular assessment so you should not expect to find any additional information about it elsewhere on the Web/Internet. This document is the definitive definition/description of Shack. Shack offers a way to write programs for the Hack architecture in a form that is slightly higher level than Hack assembly code. In other words, each instruction in Shack will often map to more than one Hack instruction. This makes it slightly easier to write commonly occurring operations. Rather than outputting Hack binary, your assembler will output equivalent Hack assembler code which you can then run on the Hack CPU Emulator if you wish.

The sections below describe the Shack language, the equivalent Hack instructions, and the requirements of the assembler you must write.

Outline requirements

You must submit the source files of a program that translates from Shack assembler to Hack assembler. The submission will be via an Upload area in the Assessments section of the modules Moodle page.

It must be possible to run your program from the command line on raptor.

Your program must take a single command-line argument which is the name of the Shack source file to be translated.

The assembler must only accept source files with a .shk file suffix.

The assembler must write the translated Hack version to a file whose name has the same prefix as the Shack source file but a .asm suffix. The file written to must be in the same directory/folder as the Shack source file.

You may implement the program in a language of your choice under the constraints that the language must be available to the marker on raptor and it must be possible for the marker both to compile (where required by the language) and execute your code without having to install a particular IDE or any other software, such as libraries or build tools. These constraints are important. Any libraries required must either be bundled with your submission or already pre-installed and accessible to the marker.

It is essential that your submission includes sufficiently detailed instructions to allow the marker to run your programs, particularly if it is written in a language other than Java. Further details of what this will mean in practice will be provided before the deadline.

two File for run

import java.io.IOException;

/**

* Driver program for Shack ASM to Hack ASM.

public class Main {

/**

* @param args the command line arguments

public static void main(String[] args) {

if(args.length != 1) {

System.err.println("Usage: java Main file.shk");

}

else {

String filename = args[0];

if(filename.endsWith(".shk")) {

try {

Assembler assem = new Assembler();

assem.assemble(filename);

} catch (IOException ex) {

System.err.println("Exception parsing: " + filename);

System.err.println(ex);

}

else {

System.err.println("Unrecognised file type: " + filename);

}

Second

package assembler;

import java.io.BufferedReader;

import java.io.FileReader;

import java.io.FileWriter;

import java.io.IOException;

/**

* Coordinate the translation of Hack assembly code to

* Hack machine code.

* @author

* @version

public class Assembler {

/**

* Create an assembler.

public Assembler()

{

}

/**

* Translate the Hack asm file.

* @param filename The file to be translated.

* @throws IOException on any input issue.

public void assemble(String filename)

throws IOException

{

}

The Shack language: lexical and syntactic conventions

image text in transcribed

The Shack language: lexical and syntactic conventions Shack programs are stored in files with a 'shk' suffix. The assembler must translate a single shk file on each run. Any additional arguments must be ignored. Comments: As in both Java and Hack, text beginning with two forward-slash character (17) up to the end of the line on which it occurs is a human-readable comment and requires no translation. Comments are ignored by the assembler. Whitespace: Blank lines are ignored by the assembler. Each Shack instruction must be written on a single line. Unlike in Hack, whitespace is used to separate an instruction mnemonic from its operands. Except where restricted below, additional whitespace may be used anywhere within a line, for instance to indent instructions or enhance readability. Numeric constants: Numeric constants must be positive integer values, written in decimal notation, in the range 0-32767. Labels: Labels are used as symbolic names for memory addresses. They consist of 1 or more alphabetic, numeric and underscore characters starting with an alphabetic character. Shack distinguishes between labels for ROM instruction addresses and those for RAM data addresses. All labels for RAM data addresses must be 'declared' in advance in a data section' that appears before the instructions which appear in a 'code section' (further details below). ROM labels are 'declared' by using them to label the following instruction. A label may not be used for a ROM address if it has been declared in the data section as a RAM label. A label that matches a Shack instruction name (opcode) must not be used. Instruction mnemonics: All instruction mnemonics are either 3 or 4 case-sensitive alphabetic characters long, entirely in upper-case. The Shack language: declarations and instructions A Shack source file has two parts: a declaration section, for listing the names of RAM labels, and a code section. The RAM labels in the declaration section would be translated as variables in the Hack version. These names would ultimately be turned into sequential addresses starting at 16 in the Hack binary version but the Shack assembler does not translate them into their numeric equivalents Note that 'translation' of the .dec section does not directly result in any code being generated because Hack variables are not declared. The following short example shows both sections and how they are introduced in a Shack source file: . dec sum X y . code LOAD Dx ADDD Y STO D sum The RAM declarations are introduced by the symbol .dec' which must be on a line by itself. If there are no declarations then the declaration section may be completed omitted. If present, there must be zero or more names for RAM locations, each on its own line. The instructions are introduced by the symbol .code' which must be on a line by itself. If there are no instructions then the code section may be completed omitted. If present, there The Shack language: instruction labels Instruction labels may only appear in the code section. A label must be followed by a colon character, without any separating spaces. The label name does not include the colon symbol. A label must appear on a line by itself but multiple labels, on successive lines, may be used to label the same instruction. For instance: start: loop: LOAD DX ADDD Y STO D sum JMP loop It is not permitted for an instruction label to be the same as a RAM label. The Shack language: instructions This section describes the instructions available in the Shack language and their translations into equivalent Hack assembly code. Please note the following definitions of operands which are used in the desriptions: An 'addr' operand refers to a RAM address and 'addr' may be either a number (such as 2476) or a variable label (such as sum). An addr operand is always translated as a reference to location RAM ( addr]. A '#value' operand must have no space between # and value and 'value' may be either a number (such as 2476) or a variable label (such as sum). The operand represents the literal numerical value or the address to which the label corresponds. . Please ensure that you understand the difference in meaning between operands 256 and #256, for instance. One means RAM[256] and the other means the value 256. This is similar to the difference between A and Min Hack notation. Instructions operating on the D register The following instructions operate on the D register, combining it in some way with either a numeric value (#value) or a value stored in RAM (addr). The first column of the table describes the general form of the instruction. The second column shows the general translated form. The third column shows a specific Shack example and the fourth column shows how that specific example would be translated into Hack by your assembler. Example Shack ADDD #13 ADDD 13 Translation @13 D = D + A @13 D = D + M @sum D = D & A ANDD #sum ANDD sum General form Translation ADDD #value @value D = D + A ADDD addr @addr D = D +M ANDD #value @value D=D&A ANDD addr @addr D=D&M ORD #value @value DE DA ORD addr @addr D=DM SUBD #value @value D = D-A SUBD addr @addr D=D-M @sum D = D & M @x D = DA ORD #x ORD 2371 SUBD #13 @2371 D = DM @13 D = D - A @13 D = D - M SUBD 13 Zero operand arithmetic and boolean instructions The following instructions involve no explicit operands and operate on the register. General form INC DEC CLR NEG NOT Translation D = D + 1 D = D-1 D = 0 D = -D Example Shack INC DEC CLR NEG NOT Translation D = D + 1 D = D-1 D = 0 D = -D D = ! D D = !D Simple STO instruction The following two-operand instructions store the value of either A or D into RAM at the given address. General form STO A addr Translation D=A @addr M=D @addr M=D Example Shack Translation STO A 256 D = A @256 M = D STO D addr STO D sum @sum M = D Simple LOAD register instructions The following instructions allow a value to be loaded into either the A or D register. They have simple translations to their Hack equivalents. General form LOAD DA LOAD A D LOAD A #value LOAD A addr Translation Example Shack D = A LOAD DA A=D LOAD AD @value LOAD A #256 @addr LOAD A 256 A = M Translation D = A A = D @256 @256 A = M Complex LOAD instructions The following LOAD instructions are designed to load a value into the D register. They are complicated by the need to avoid overwriting any value currently stored in the A register. In order to avoid this, the translation firstly involves saving the current value of A to R13 (RAM[13]) and then restores it afterwards. We will call the following sequence 'save A': D = A @R13 M = D and the following sequence 'restore A': @R13 A = M General form LOAD D #value Example Shack LOAD D #sum Translation save A @value D=A restore A Translation D = A @R13 M = D @sum D = A @R13 A = M D = A @R13 M = D @sum D = M @R13 A = M LOAD D addr LOAD D sum save A @addr D = M. restore A Jump instructions All the relational jump instructions are dependent on the value stored in the D register. Note that 'dest' must be either a numeric value or a ROM label but no '# character is required before a numeric value. It is not permitted to use a RAM label as a destination of a jump instruction. General form Translation Translation Example Shack JMP loop JMP dest @dest 0; JMP @loop 0; JMP JGT dest JGT 45 JEQ dest JEQ loop JGE dest JGE 1295 @dest D; JGT @dest D; JEQ @dest D; JGE @dest D; JLT @dest D; JNE @dest D; JLE @45 D; JGT @loop D; JEQ @1295 D; JGE @loop D; JLT @loop D; JNE @loop D; JLE JLT dest JLT loop JNE dest JNE loop JLE dest JLE loop Detectable errors Note that there are lots of potential error cases that are not covered in detail below. The main priority for assessment is to write an assembler that correctly translates from Shack to Hack. I will not be trying to catch you out with 'sneaky' error cases as part of the assessment. If you are not sure what error to output for any particular error case, it will be acceptable to output simply: Unrecognised input Any errors in the command-line arguments must result in the following error message and the program must terminate without attempting to process any input: Usage: sham file.shk Such errors would include no source file, more than one source file or a source file with the incorrect file suffix. If the single argument is valid but the file either does not exist or cannot be read, the result should be the following error message and the program must terminate: Unable to read file.shk where 'file.shk' would be replaced by the command-line argument. The assembler must attempt to translate as much of its source file as possible. As the input is line-based, an error on one line should not prevent the processing of subsequent lines. All error messages must be written to System.err - or its equivalent in the language you choose to implement in. You may assume that there will never be more than one declaration or code section in a source file and that, when present, they will always be in the correct order: declaration section first, code section second. illegal characters Any illegal characters (outside of comments) must be reported in the order in which they occur in the source file. The format for each illegal character error message must be: Illegal character: x where X ould be the illegal acter. Each character must be reported on a separate line. illegal character error messages take precedence over all other error messages. If an illegal character is found on an input line then no further processing of the line should be undertaken. In effect, this means that no more than one illegal character needs to be reported for any single line but it does not matter if more than one illegal character is detected and reported for a single line. Instruction operands Any illegal instructions and operands must be reported in the order in which they occur in the source file. The format for each illegal opcode is: Illegal opcode: ABCD where ABCD would be the illegal opcode. The format for illegal operands is: Illegal operand: ABCD where 'ABCD' would be the illegal operand. For instance, an operand such as #35 is not valid for a JMP instruction, so the instruction: JMP #45 should result in the error message: Illegal operand: #45 Note that illegal characters might occur within operands. For instance: JMP ! 45 In such cases, as indicated above, the 'Illegal character' error message would take precedence and no 'Illegal operand' error would be issued. If there are either too few or too many operands for an instruction then the error message would be: Incorrect number of operands for ABCD where 'ABCD' would be the opcode. Reporting an illegal number of operands takes precedence over reporting any illegal characters in the operands and only one error message would be printed in this case. Labels used incorrectly Because all RAM labels must be declared in advance, the assembler is able to detect when a RAM label that has not been declared is used. It must issue an error message in the following form: RAM label xyz has not been declared. where 'xyz' would be replaced by the name of the label used incorrectly. If there are multiple RAM labels to report in this way then the report should be in the order in which the labels have been used in the code section. However, an undeclared label must only be reported at most once even if it has been used more than once. A RAM label used as the destination of a jump is not permitted and any occurrences must be reported in the order in which they occur in the source file. The error message must be in the following form: RAM label xyz has been used as a jump destination. where 'xyz' would be replaced by the name of the label used incorrectly. If a RAM label has been used more than once in this way then each occurrence must be reported. It is not an error for a RAM label to be declared more than once in the declaration section as the assembler does not allocate addresses to labels. If a ROM label is defined more than once, each redefinition must be reported as soon as it is detected in the following form: ROM label xyz has been defined more than once. where 'xyz' would be replaced by the name of the label used incorrectly. If a ROM label duplicates a RAM label it must be reported as soon as it is detected in the following form: ROM label xyz has been defined as a RAM label. where xyz' would be replaced by the name of the label used incorrectly. In the unlikely case that a ROM label is duplicated and also duplicates a RAM label, the 'more than once' error message must come before the 'RAM label' error message. Attempting to use or define a RAM or ROM label that is the same as an opcode would result in an error message in the following form: ABCD is an opcode and may not be used as a label. where 'ABCD' would be replaced by the illegal label. The error must be issued at the point the illegal label is first encountered or defined and must only be issued once. A ROM label that has not been defined used as the destination of a jump instruction must be reported. This will only be possible once the complete source file has been read as forward references are permitted. The assembler must output a separate error message for each label used incorrectly in that way, in the following form: Instruction label xyz has not been defined. where 'xyz' would be replaced by the name of the label used incorrectly. If there are multiple instruction labels to report in this way then the report must be in alphanumeric sorted order of ascending label; for instance: Instruction label abc has not been defined. Instruction label mno has not been defined. Instruction label xyz has not been defined. . In summary: RAM label errors are reported as they are encountered; Duplicate ROM label errors are reported as they are encountered; ROM label definitions that duplicate RAM labels are reported as they are encountered; Undefined ROM label errors are reported only at the end of the analysis. . . Assessment Marks will be based entirely on functionality. We will not evaluate your coding. Therefore, it is essential that your submission can be executed. If it cannot be executed, it will be impossible to assess its functionality and you will automatically receive a mark of zero. Your implementation will be marked with the help of an automatic test harness. The marking test harness will invoke the assembler with a variety of source files containing examples of the different instructions and other features of the Shack language. Some sample will be provided before the deadline for you to self-test. Marks will be awarded according to how well your code performs in these tests. An implementation that functions correctly for all tests will score 100%. We will not be releasing the test data used for marking so it is important that you supplement the tests we do provide with additional ones of your own. . Date of this version: 2021.12.27 Changes to the original release: Corrected the ADD instructions to be ADDD in the two examples of code. Errors to be written to System.err (or equivalent). The whole source file should be processed. Further specifications for error conditions and messages added. Labels that duplicate opcodes outlawed. Catch-all of 'Unrecognised input.' added for errors not otherwise covered explicitly. Clarifying note added to the 'declarations and instructions' section to make it clear that no code is generated directly from the 'translation'/parsing of the .dec section. This does not change the brief but addresses a potential misconception. . . . Deadline extended. Text of the illegal characters section clarified