联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2024-03-24 11:47


CSCI 2122 Assignment 4

Due date: 11:59pm, Friday, March 22, 2024, submitted via git

Objectives

The purpose of this assignment is to practice your coding in C, and to reinforce the concepts discussed in class on program representation.

In this assignment1 you will implement a binary translator2 like Rosetta3. Your program will translate from a simple instruction set (much simpler than x86) to x86 and generate x86 assembly code. The code will then be tested by assembling and running it. This assignment is divided into two parts to make it simpler. In the first part, you will implement the loader and a simple translator, which translates the simpler in- structions. In the second part, you will extend the translator to translate more complex instructions.

Preparation:

1. Complete Assignment 0 or ensure that the tools you would need to complete it are installed.

2. Clone your assignment repository:

https://git.cs.dal.ca/courses/2024-winter/csci-2122/assignment-4/????.git

where ???? is your CSID. Please see instructions in Assignment 0 and the tutorials on Brightspace if

you are not sure how.

Inside the repository there is one directory: xtra, where code is to be written. Inside the directory is a tests directory that contains tests that will be executed each time you submit your code. Please do not modifythetestsdirectoryorthe.gitlab-ci.ymlfilethatisfoundintherootdirectory. Modifying these files may break the tests. These files will be replaced with originals when the assignments are graded. You are provided with sample Makefile files that can be used to build your program. If you are using CLion, a Makefile will be generated from the CMakeLists.txt file generated by CLion.

Background:

For this assignment you will translate a binary in a simplified RISC-based 16-bit instruction set to x86-64 assembly. Specifically, the X instruction set comprises a small number (approximately 30) instructions, most of which are two bytes (one word) in size.

The X Architecture has a 16-bit word-size and 16 general purpose 16-bit registers (r0 . . . r15 ). Nearly all instructions operate on 16-bit chunks of data. Thus, all values and addresses are 16 bits in size. All 16-bit values are also encoded in big-endian format, meaning that the most-significant byte comes first.

Apart from the 16 general purpose registers, the architecture has two special 16-bit registers: a program counter (PC), which stores the address of the next instruction that will be executed, and the status (F), which stores bit-flags representing the CPU state. The least significant bit of the status register (F) is the condition flag, which represents the truth value of the last logical test operation. The bit is set to true if the condition was true, and to false otherwise.

1 The idea for this assignment came indirectly from Kyle Smith. 2 https://en.wikipedia.org/wiki/Binary_translation

3 https://en.wikipedia.org/wiki/Rosetta_(software)

   

Additionally, the CPU uses the last general-purpose register, r15, to store the pointer to the program stack. This register is incremented by two when an item is popped off the stack and decremented by two when an item is pushed on the stack. The program stack is used to store temporary values, arguments to a function, and the return address of a function call.

The X Instruction Set

The instruction set comprises approximately 30 instructions that perform arithmetic and logic, data move- ment, stack manipulation, and flow control. Most instructions take registers as their operands and store the result of the operation in a register. However, some instructions also take immediate values as oper- ands. Thus, there are four classes of instructions: 0-operand instructions, 1-operand instructions, 2-oper- and instructions, and extended instructions, which take two words (4 bytes) instead of one word.

All but the extended instructions are encoded as a single word (16 bits). The extended instructions are also one word but are followed by an additional one-word operand. Thus, if the instruction is an extended instruction, the PC needs an additional increment of 2 during the instruction’s execution. As mentioned previously, most instructions are encoded as a single word. The most significant two bits of the word indicates whether the instruction is a 0-operand instruction (00), a 1-operand instruction (01), a 2-operand instruction (10), or an extended instruction (11). For a 0-operand instruction encoding, the two most sig-

nificant bits are 00 and the next six bits represent the instruction identifier. The second byte of the instruction is 0.

For a 1-operand instruction encoding, the two most significant bits are 01, the next bit indicates whether the operand is an immediate or a register, and the next five bits represent the instruction identifier. If the third most significant bit is 0, then the four most significant bits of the second byte

encode the register that is to be operated on (0... 15). Otherwise, if the third most significant bit is 1, then the second byte encodes the immediate value.

For a 2-operand instruction encoding, the two most significant bits are 10, and

the next six bits represent the instruction identifier. The second byte encodes the two register operands in two four-bit chunks. Each of the 4-bit chunks identifies a register (r0 ... r15).

For an extended instruction encoding, the two most significant bits are 11, the next bit indicates whether a second register operand is used, and the next five bits represent the instruction identifier. If the third most significant bit is 0, then the instruction only uses the one-word immedi-

ate operand that follows the instruction. Otherwise, if the third most significant bit is 1, then the four most significant bits of the second byte encode a register (1 ... 15) that is the second operand.

The instruction set is described in Tables 1, 2, 3, and 4. Each description includes the mnemonic (and syntax), the encoding of the instruction, the instruction’s description, and function. For example, the add, loadi, and push instructions have the following descriptions:

loadi V, rD 11100001 D 0 Load immediate value or address V into rD ← memory[PC] register rD. PC ← PC + 2

        Mnemonic Encoding Description Function

 add rS, rD

  10000001 S D

  Add register rS to register rD.

  rD ← rD + rS

  push rS

  01000011 S 0

  Push register rS onto program stack.

  r15 ← r15 - 2 memory[r15 ] ← rS

First, observe that the add instruction takes two register operands and adds the first register to the sec- ond. All 2-operand instructions operate only on registers and the second register is both a source and destination, while the first is the source. It is a 2-operand instruction; hence the first two bits are 10, its instruction identifier is 000001 hence the first byte of the instruction is 0x81.

Second, the loadi instruction is an extended instruction that takes a 16-bit immediate and stores it in a register. Hence, the first two bits are 11, the register bit is set to 1, and the instruction identifier is 00001. Hence, the first byte is encoded as 0xE1.

Third, the push instruction is a 1-operand instruction, taking a single register operand. Hence, the first two bits are 01, the immediate bit is 0, and the instruction identifier is 00011. Hence, the first byte is encoded as 0x43.

Note that S and D are 4-bit vectors representing S and D. Table 1: 0-Operand Instructions

    Mnemonic Encoding Description Function

 ret

  00000001 0

  Return from a procedure call.

  P C ← memory[r15 ] r15 ← r15 + 2

cld     00000010 0 Table 1: 1-Operand Instructions

Stop debug mode

Logically negate register rD . Decrement rD .

Pop value from stack into register rD.

Branch relative to label L if condition bit is true.

Subtract register rS from register rD. And register rS with register rD .

Xor register rS with register rD .

See Debug Mode below.

rD ←!rD

rD ← rD – 1

rD ← memory[r15 ] r15 ← r15 + 2

if F & 0x0001 == 0x001: PC ← PC + L – 2

rD ← rD - rS rD ← rD & rS rD ← rD ^ rS

std

 00000011 S 0

  Start debug mode

  See Debug Mode below.

     Mnemonic Encoding Description Function

 neg rD

  01000001 D 0

  Negate register rD .

  rD ← ?rD

not rD dec rD

pop rD br L

01000010 D 0 01001001 D 0

01000100 D 0 01100001 L

 inc rD

  01001000 D 0

  Increment rD .

  rD ← rD + 1

  push rS

  01000011 S 0

  Push register rS onto the pro- gram stack.

  r15 ← r15 – 2 memory[r15] ← rS

  out rS

  01000111 S 0

  Output character in rS to std- out.

  output ← rS (see below)

  jr L

  01100010 L

  Jump relative to label L.

  PC ← PC + L – 2

Table 3: 2-Operand Instructions

    Mnemonic Encoding Description Function

 add rS , rD

  10000001 S D

  Add register rS to register rD .

  rD ← rD + rS

sub rS , rD and rS , rD xor rS , rD

10000010 S D 10000101 S D 10000111 S D

 mul rS , rD

  10000011 S D

  Multiply register rD by register rS.

  rD ← rD * rS

  or rS , rD

  10000110 S D

  Or register rS with register rD .

  rD ← rD | rS

   

   test rS1, rS2

  10001010 S D

 Set condition flag to true if and only if rS1 ∧ rS2 is not 0.

   ifrS1 &rS2 !=0:

F ← F | 0x0001

else:

F ← F & 0xFFFE

  cmp rS1, rS2

 10001011 S D

Set condition flag to true if and only If rS1 < rS2.

  if rS1 < rS2:

F ← F | 0x0001

else:

F ← F & 0xFFFE

   equ rS1, rS2

   10001100 S D

  Set condition flag to true if and only if rS1 == rS2.

    if rS1 == rS2:

F ← F | 0x0001

else:

F ← F & 0xFFFE

mov rS , rD stor rS , rD

10001101 S D 10001111 S D

10010001 S D Table 3: Extended Instructions

Copy register rS to register rD .

Store word from register rS to memory at address in register rD.

Store byte from register rS to memory at address in register rD.

rD ← rS memory[rD] ← rS

(byte)memory[rD] ← rS

 load rS , rD

  10001110 S D

  Load word into register rD from memory pointed to by register rS.

  rD ← memory[rS]

  loadb rS , rD

  10010000 S D

  Load byte into register rD from memory pointed to by register rS.

  rD ← (byte)memory[rS]

storb rS , rD

        Mnemonic Encoding Description Function

 jmp L

  11000001 0

  Absolute jump to label L.

  PC ← memory[PC]

  call L

11000010 0

Absolute call to label L..

 r15 ← r15 – 2 memory[r15] ← PC + 2 PC ← memory[PC]

 loadi V, rD

  11100001 D 0

  Load immediate value or address V into register rD.

  rD ← memory[PC] PC ← PC + 2

Note that in the case of extended instructions, the label L or value V are encoded as a single word (16-bit value) following the word containing the instruction. The 0 in the encodings above represents a 4-bit 0 vector.

An assembler is provided for you to use (if needed). Please see the manual at the end of the assignment.

The Xtra Translation Specification (IMPORTANT)

The binary translation is conducted in the following manner. The translator

1. Opens the specified file containing the X binary code.

2. Outputs a prologue (see below), which will be the same for all translations. 3. It then enters a loop that

a. Reads the next instruction from the binary

b. Decodes the instruction, and

c. Outputs the corresponding x86 assembly instruction(s). If the instruction is an extended,

an additional two bytes will need to be read.

d. The loop exits when the instruction composed of two 0 bytes is read.

4. Outputs an epilogue.

Prologue

The translator first outputs a simple prologue that is the same for all translations. The prologue is shown on the right. Epilogue

After the translator finishes translating, it outputs a simple ep- ilogue that is the same for all translations. The epilogue is shown on the right.

Translation

Each X instruction will need to be translated into

the corresponding instruction or instructions in

x86-64 assembly. See table on right for examples.

Most instructions will have a direct corresponding

instruction in x86 assembly so the translation will

beeasy. Someinstructions,liketheequ,test,andcmp,instructions may require multiple x86 instructions for a single X instruction. Note: The translator will need to perform a register mapping. Register Mapping

The X architecture has 16 general and the F status register. In x86-64

there are also 16 general purpose registers. The register mapping on

the right must be used when translating from X to x86-64. Note that

for this exercise register r13 will not be used by the X executables. In-

stead of r13 (X) being mapped to r15 (x86), the F register (X) is mapped

to register r15 (x86). Note: for this assignment, It is ok to map 16-bit registers to 64-bit registers. r9 Debug mode STD and CLD

.globl test

test:

   push %rbp

   mov %rsp, %rbp

   pop %rbp ret

      mov $42, %rax

      add %rax, %rdi

           %rbx

           %rdx

           %rdi

           %r9

           %r11

           %r13

           %r15

           %rsp

call debug

  X Instruction Output x86 Assembly

 mov r0, r1

  mov %rax, %rdi

 loadi 42, r0

add r0, r1

push r0

  push %rax

      X Registers x86 Registers

r0

  %rax

 r1 r3 r5 r7

r2

  %rcx

  r4

  %rsi

  r6

  %r8

  r8

  %r10

  r10

  %r12

 The std and cld X instructions enable and disable debug mode on the X architecture. However, debug mode does not exist in x86-64. Instead, when a std instruction is encountered, the translator should set an internal debug flag in the translator and, clear the debug flag when it encounters the cld instruction.

When the debug flag is true, the translator should output the assembly code on the right before translating each X instruction.

Output and the OUT Instruction (For Task 2)

r11 F r15

r12

  %r14

  r14

  %rbp

     On the X architecture, the out rN instruction outputs to the screen the character stored in register rN. However, no such instruction exists in x86-64. Instead, the out instruction is translated to a call to the function outchar(char c), which performs this function. Recall that the first argument is passed in register %rdi. Consequently, the corresponding translation code will need to save the current value of %rdi on the stack, move the byte to be printed into %rdi, call outchar, and restore %rdi.

Your task will be to implement the Xtra binary translator which is used to translate into x86 assembly programs assembled with the X assembler.

Task 1: Implement the Simple Xtra

Your first task is to implement a simple version of the translator that works for the simple instructions. The source file main.c should contain the main() function. The translator should:

1. Take one (1) argument on the command line: The argument is the object/executable file of the program to translate. For example, the invocation

./xtra hello.xo

instructs the translator to translate the program hello.xo into x86-64 assembly.

2. Open for reading the file specified on the command-line.

3. Output (to stdout) the prologue.

4. In a loop,

a. Read in instruction.

b. If the instruction is 0x00 0x00, break out of the loop.

c. Translate the instruction and output (to stdout) the x86-64 assembly.

5. Output (to stdout) the epilogue.

Input

The input to the program is via the command line. The program takes one argument, the name of the file containing the assembled X code.

Processing

All input shall be correct. All program files shall be at most 65536 bytes (64KB). The translator must be able to translate all instructions except:

  Instruction Description

 ret

  Return from a procedure call.

br L

jmp L

load rS , rD loadb rS , rD out rS

Branch relative to label L if condition bit is true.

Absolute jump to label L.

Load word into register rD from memory pointed to by register rS. Load byte into register rD from memory pointed to by register rS. Output character in rS to stdout.

 jr L

  Jump relative to label L.

  call L

  Absolute call to label L.

  stor rS , rD

  Store word from register rS to memory at address in register rD.

  storb rS , rD

  Store byte from register rS to memory at address in register rD.

  Recommendation: While no error checking is required, it may be helpful to still do error checking, e.g., make sure files are properly opened because it will help with debugging as well.

Output

Output should be to stdout. The output is the translated assembly code. The format should ATT style assembly. The exact formatting of the assembly is up to you, but the assembly code will be passed through the standard assembler (as) on timberlea. See next section for how to test your code. (See example)

Testing

To test your translator, the test scripts will assembler, link, and run the translated code! J A runit.sh script is provided. The script takes an X assembly file as an argument:

./runit.sh foo.xas The script:

 

1. Assembles the .xas file with the provided (xas) to create a .xo file.

2. Runs xtra on the .xo file, to create a corresponding x86 .s assembly file.

3. Assembles, compiles, and links the generated assembly file with some runner code, creating an

executable. Therunneriscomposedofrunner.c,regsdump.s,andthetranslated.sfile.

Please DO NOT delete the first two files.

4. Runs the executable.

This script is used by the test scripts and is also useful for you to test your code.

Most of the tests use the std instruction to turn on debugging and output the state of the registers after each instruction is executed. For most of the tests the output being compared are the register values.

Example

 Original X assembly code       Translated x86 code

 loadi 2, r0

 loadi 3, r1

 loadi 4, r2

 loadi 5, r3

 loadi 7, r5

 std         # turn debugging on

 add r2, r3

mul r2, r1

cld # turn debugging off neg r0

inc r5

.literal 0

   .globl test

test:

   push %rbp

   mov %rsp, %rbp

   mov $2, %rax

   mov $3, %rbx

   mov $4, %rcx

   mov $5, %rdx

   mov $7, %rdi

   call debug

   add %rcx, %rdx

   call debug

   imul %rcx, %rbx

   call debug

   neg %rax

   inc %rdi

   pop %rbp

   ret

  Task 2: The Full Translator

Your second task is to extend xtra to translate the instructions exempted in Task 1. lation for the following instructions.

Implement trans-

  Instruction Description

 ret

  Return from a procedure call.

br L

jmp L

load rS , rD loadb rS , rD out rS

Branch relative to label L if condition bit is true.

Absolute jump to label L.

Load word into register rD from memory pointed to by register rS. Load byte into register rD from memory pointed to by register rS. Output character in rS to stdout.

 jr L

  Jump relative to label L.

  call L

  Absolute call to label L.

  stor rS , rD

  Store word from register rS to memory at address in register rD.

  storb rS , rD

  Store byte from register rS to memory at address in register rD.

 

Input

The input is the same as Task 1.

Processing

The processing is the same as for Task 1. The challenge is that translation is a bit more challenging.

First, for many of the additional instructions you will need to emit more than one assembly instruction. This is particularly true for the conditional branching and output instructions.

Second, for the branching instructions you will need to compute the labels where to branch to. The easy solution is to create a label for each instruction being translated. The label should encode the address in the name. For example, L1234 would be the label for the X instruction at address 1234. By doing this, you will not need to keep a list or database of labels.

Third, the addresses used by the load and store are full 64-bit values.

Output

The output is the same as Task 1.

Example

 Original X assembly code       Translated x86 code

loadi 1, r0

jmp j1 j2:

loadi 3, r0

jmp j3 j1:

loadi 2, r0

jmp j2 j3:

 std         # turn debugging on

 loadi 4, r0

.literal 0

   .globl test

test:

push %rbp

   mov %rsp, %rbp

.L0000:

   mov $1, %rax

.L0004:

   jmp .L0010

.L0008:

   mov $3, %rax

.L000c:

   jmp .L0018

.L0010:

   mov $2, %rax

.L0014:

   jmp .L0008

.L0018:

.L001a:

   call debug

   mov $4, %rax

.L001e:

   call debug

   pop %rbp

   ret

  Hints and Suggestions

? Use the unsigned short type for all registers and indices.

? Use two files: one the main program and one for the translator loop.

? Start early, this is the hardest assignment of the term and there is a lot to digest in the assignment

specifications.

Assignment Submission

Submission and testing are done using Git, Gitlab, and Gitlab CI/CD. You can submit as many times as you wish, up to the deadline. Every time a submission occurs, functional tests are executed, and you can view the results of the tests. To submit use the same procedure as Assignment 0.

Grading

If your program does not compile, it is considered non-functional and of extremely poor quality, mean- ing you will receive 0 for the solution.

The assignment will be graded based on three criteria:

Functionality: “Does it work according to specifications?”. This is determined in an automated fashion by running your program on several inputs and ensuring that the outputs match the expected outputs. The score is determined based on the number of tests that your program passes. So, if your program passes t/T tests, you will receive that proportion of the marks.

Quality of Solution: “Is it a good solution?” This considers whether the approach and algorithm in your solution is correct. This is determined by visual inspection of the code. It is possible to get a good grade on this part even if you have bugs that cause your code to fail some of the tests.

Code Clarity: “Is it well written?” This considers whether the solution is properly formatted, well docu- mented, and follows coding style guidelines. A single overall mark will be assigned for clarity. Please see the Style Guide in the Assignment section of the course in Brightspace.

The following grading scheme will be used:

       Task 100% 80% 60% 40% 20% 0%

       Functionality (20 marks)

 Equal to the number of tests passed.

     Solution Quality Task 1

(15 marks)

Implemented correctly. Code is robust.

Implemented cor- rectly. Code is not robust.

Some minor bugs.

Major flaws in implementation

An attempt has been made.

 Solution Quality Task 2

(5 marks)

  Implemented correctly. Code is robust.

  Implemented cor- rectly. Code is not robust.

  Some minor bugs.

  Major flaws in implementation

  An attempt has been made

 Code Clarity (10 marks) Indentation, for- matting, naming, comments

  Code looks pro- fessional and fol- lows all style guidelines

Code looks good and mostly fol- lows style guide- lines.

 Code is mostly readable and mostly follows some of the style guidelines

 Code is hard to read and fol- lows few of the style guidelines

  Code is not legible

  No code submitted or

code does not compile

Assignment Testing without Submission

Testing via submission can take some time, especially if the server is loaded. You can run the tests without submittingyourcodebyusingtheprovidedruntests.shscript. Runningthescriptwithnoarguments will run all the tests. Running the script with the test number, i.e., 00, 01, 02, 03, ... 09, will run that specific test. Please see below for how run the script.

Get your program ready to run

If you are developing directly on the unix server,

1. SSH into the remote server and be sure you are in the xtra directory. 2. Be sure the program is compiled by running make.

If you are using CLion

1. Run your program on the remote server as described in the CLion tutorials. 2. Open a remote host terminal via Tools → Open Remote Host Terminal

If you are using VSCode

1. Run your program on the remote server as described in VSCode tutorials.

2. Click on the Terminal pane in the bottom half of the window or via Terminal → New Terminal

Run the script

3. Run the script in the terminal by using the command:

   ./runtest.sh

to run all the tests, or specify the test number to run a specific test, e.g. :

   ./runtest.sh 07

You will see the test run in the terminal window.

The X Assembler (xas)

An assembler (xas) is provided to allow you to write and compile programs for the X Architecture. To make the assembler, simply run “make xas” in the xtra directory. To run the assembler, specify the assembly and executable file on the command-line. For example,

   ./xas example.xas example.xo

Assembles the X assembly file example.xas into an X executable example.xo.

The format of the assembly files is simple.

1. Anything to the right of a # mark, including the #, is considered a comment and ignored.

2. Blank lines are ignored.

3. Each line in the assembly file that is not blank must contains a directive, a label and/or an instruc-

tion. If the line contains both a label and an instruction, the label must come first. 4. A label is of the form

identifier:

where identifier consists of any sequence of letters (A-Za-z), digits (0-9), or underscores ( ) as long the first character is not a digit. A colon (:) must terminate the label. A label represents the corre- sponding location in the program and may be used to jump to that location in the code.

5. An instruction consists of a mnemonic, such as add, loadi, push, etc., followed by zero or more operands. The operand must be separated from the mnemonic by one or more white spaces. Multiple operands are separated by a comma.

     

6. If an operand is a register, then it must be in the form r# where # ranges between 0 and 15 inclu- sively. E.g., r13.

7. If the instruction is an immediate, then the argument may either be a label, or an integer. Note: labels are case-sensitive. If a label is specified, no colon should follow the label.

8. Directives instruct the assembler to perform a specific function or behave in a specific manner. All directives begin with a period and are followed by a keyword. There are three directives: .lit- eral, .words and .glob, each of which takes an operand.

(a) The .literal directive encodes a null terminated string or an integer at the present

location in the program. E.g.,

mystring:

.literal "Hello World!" myvalue:

.literal 42

encodes a nil-terminated string followed by a 16-bit (1 word) value representing the dec- imal value 42. In this example, there are labels preceding each of the encodings so that it is easy to access these literals. That is, the label mystring represents the address (rel- ative to the start of the program) where the string is encoded, and the label myvalue represents the address (relative to the start of the program) of the value. This is used in the hello.xas example.

(b) The.wordsdirectivesetsasideaspecifiednumberofwordsofmemoryatthepresent

location in the program. E.g., myspace:

.words 6

allocates 6 words of memory at the present point in the program. This space is not initial- ized to any specific value. Just as before, the label preceding the directive represents the address of the first word, relative to the start of the program. This is used in xrt0.xas to set aside space for the program stack.

(c) The .glob directive exports the specified symbol (label) if it is defined in the file and imports the specified symbol (label) if it is used but not defined in the file. E.g.,

.glob foo .glob bar ...

loadi bar, r0 ...

foo:

.literal "Hello World!"

declares two symbols (labels) foo and bar. Symbol foo is declared in this file, so it will be exported (can be accessed) in other files. The latter symbol, bar, is used but not defined. When this file is linked, the linker looks for the symbol (label) definition in other files makes all references to the symbol refer to where it is defined.

Note: it is recommended that you place literals and all space allocations at the end of your program, so that they will not interfere with program itself. If you do place literals in the middle of your program, you will need to ensure that your code jumps around these allocations.

There are several example assembly files provided (ending in .xas). You can assemble them by invoking

the assembler, for example:

./xas example.xas example.xo

This invocation will cause the assembler to read in the file example.xas and generate an output file example.xo. Feel free to look at the code for the assembler.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp