Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Goal: write an emulator for x86, capable of running simple assembler code (binary format). Submission: executable, source, examples, readme. Make sure that submission actually
Goal: write an emulator for x86, capable of running simple assembler code (binary format). \ Submission: executable, source, examples, readme. Make sure that submission actually gets to me! Specifics: Your program should be able to load a binary executable (.com) format and perform the instructions one by one. To be able to see that the program works, one should either implement some output functions or be able to show the registers contents, of course doing both is desirable. For a good project, all the instructions covered so far plus those covered within two weeks should be supported These are the instructions covered in class. 6.1 Overall program structure and work #define byte unsigned char #define word unsigned short byte mem[0x100000]; Physical registers should be represented as variables, f.e. byte AL,AH,...; word AX,BX,...; word IP,FLAGS,...; word CS,DS,...; Notice that setting say AL should result in changes not just in AL but also AX ! There are two schemes of memory layouts that can be used in an emulator: 1 Simplified scheme: we define 64kb arrays of byte byte code[0x10000]; // to contain the code byte data[0x10000]; // to contain the data(variables) byte stack[0x10000]; // to contain the data(variables) 2 Correct scheme: we define a single 1mb array byte memory[0x100000]; // to contain the code with the code,data and stack being allocated within this array: byte *code=memory[CS*16]; // to contain the code byte *data=memory[DS*16]; // to contain the data(variables) byte *stack=memory[SS*16]; // to contain the data(variables) where CS,DS, SS are assigned some values, for example, 0x1000,0x2000,0x3000. For .com programs one should initialize all three segment registers to the same value, for example 0x10000. In reality segments are initialized by the OS upon loading the program. We notice that simpler programs would work with the simplified scheme, while more complex may fail; this certainly applies to programs that modify segment registers. A part of the memory should be dynamically designated as code (this is where .com file is read), this part is defined as mem[CS*16+i] (i varies between 0 and 0xFFFF). 6.2 .COM file and execution the .com file is simply the exact image of the code being executed, it is loaded into memory as is, no changes are made. Assuming you invoke your program with C> x86 try.com the following should be done: initialize your program variables ensure try.com exists open the file and read it entiry to mem[CS*16+0x100] close the file set IP=0x100, SP=0xFFFE execute instructions one by one. Sample execution loop: byte codebyte; IP=0x100; SP=0xFFFE; while(1) { // possible menu (allow user to exit, f.e.) codebyte=mem[CS*0x10+IP]; IP++; switch (codebyte) { case 0x00: ... case 0x01: ... case 0x90: break; // NOP case 0xFF: ... } // possible output (show regs) } (Some actually most of the branches will read more code). 6.3 Termination of the program. (JUMPING AHEAD) This is accomplished by calling an interrupt. We do not fully explain the instruction now, only provide the essentials. INT n; call OS function n (INTerrupt) value n is an unsigned byte. Magic number 20h indicates a terminate request (this is not the only way to do this!). INT n is encoded in two bytes, the first is 0xCD, the second is the interrupt #. We can implement it as follows (within the switch above): case 0xCD: codebyte=mem[CS*0x10+IP]; IP++; if (codebyte==0x20) exit(0); Other ways to execute terminate are INT 21h with AH=0 (no return code) or AH=4Ch (AL return code) We can implement it as follows (within the switch above): case 0xCD: codebyte=mem[CS*0x10+IP]; IP++; switch(codebyte){ case 0x20: exit(0); case 0x21: if (AH==0) exit(0); else if (AH==0x4C) exit(AL); else .... } break; 6.4 Simple output There are many ways of doing output. We only minimally cover the print char function here. This is accomplished with INT 21h, function (AH) 6; the character in \verbDL| is printed (unless DL is 0xFF, in which case it is input) Sample code (adding to the terminate code above): case 0xCD: codebyte=mem[CS*0x10+IP]; IP++; switch(codebyte){ case 0x20: exit(0); case 0x21: if (AH==0) exit(0); else if (AH==0x4C) exit(AL); else if (AH==0x06) { if (DL!=0xFF) printf("%c",DL); else ... } if (AH==0x09 { //print string.... } else .... } break; (Function 0x09 can print a string.) 6.5 How to make sample files. The shortest program that should run correctly consists of 2 bytes, 0xCD and 0x20. :P
Attachments:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started