Project
This assignment will reinforce your knowledge of the assembly process. You will need to go through all of the steps of converting an assembly source file to object code.
Your goal is to write a two-pass assembler for a subset of the MIPS instruction set. It should be able to read an assembly file from the command line and write the object code to standard output. You can make the following assumptions:
- The code segment will precede the data segment
- The source file will contain no more than 32768 distinct instructions
- The source file will define no more than 32768B of data
- The source file will not contain comments
- There will be no whitespace between arguments in each instruction
- Each line may have a symbolic label, terminated with a colon
Table 1 provides a list of the assembly directives that your assembler must recognize. Table 2 provides a list of the instructions that your assembler must recognize. Be sure that you note the arguments for each instruction. It may be helpful to refer to Appendix A.10 when writing your parser.
Table 1. List of Assembly Directives
Directive
|
Explanation
|
.text
|
Place items following this directive in the user text segment
|
.data
|
Place items following this directive in the data segment
|
.word w1,w2,...,wn
|
Store n 32b integer values in successive words in memory
|
.space n
|
Allocate n bytes of space in memory, initialized to zero
|
Table 2. List of MIPS Instructions
Mnemonic
|
Format
|
Args
|
Descriptions
|
addiu
|
I
|
|
Add immediate with no overflow
|
addu
|
R
|
3 (rd, rs, rt)
|
Add with no overflow
|
and
|
R
|
3 (rd, rs, rt)
|
Bitwise logical AND
|
beq
|
I
|
|
Branch when equal
|
bne
|
I
|
|
Branch when not equal
|
div
|
R
|
2 (rs, rt)
|
Signed integer divide
|
j
|
J
|
|
Jump
|
lw
|
I
|
|
Load 32b word
|
mfhi
|
R
|
1 (rd)
|
Move from hi register
|
mflo
|
R
|
1 (rd)
|
Move from low register
|
mult
|
R
|
2 (rs, rt)
|
Signed integer multiply
|
or
|
R
|
3 (rd, rs, rt)
|
Bitwise logical OR
|
slt
|
R
|
3 (rd, rs, rt)
|
Set when less than
|
subu
|
R
|
3 (rd, rs, rt)
|
Subtract with no overflow
|
sw
|
I
|
|
Store 32b word
|
syscall
|
R
|
0
|
System call
|
In addition to the instructions above, your assembler must be able to resolve symbolic labels. These labels may be targets used for changes in the control flow (branch or jump instructions) or as names for memory elements. The way labels are handled differs depending on their usage. Targets for branch instructions should be referenced as the location of the target in memory relative to the current instruction (remember that the PC points to the next instruction). For example, consider the code below:
00400400 :
400400:
400404:
|
1100000c
00000000
|
beqz nop
|
t0,400434
|
|
400408:
40040c:
400410:
|
01084021
1100fffc 00000000
|
addu beqz nop
|
t0,t0,t0 t0,400400
|
|
400414:
400418:
40041c:
|
01084021
1100fff9 00000000
|
addu beqz nop
|
t0,t0,t0 t0,400400
|
|
400420:
400424:
|
01084021
1100fff6
|
addu beqz
|
t0,t0,t0 t0,400400
|
|
400428:
|
00000000
|
nop
|
|
|
40042c:
|
11000001
|
beqz
|
t0,400434
|
|
400430:
|
00000000
|
nop
|
|
|
00400434 :
400434: 00000000 nop
You can see that the forward branches to L5 (in pink) have distances of 12 and 1. If you count the instructions from the two branch instructions, you can see that the actual numbers of instructions are 13 and 2 - the PC will have already advanced to the next instruction. The same is true for the backward branches to L4 (the non-colored branches). The branches use two's complement for the target calculations, so the first branch, 0x1100fffc, is at an offset of 0xfffc from the target. If you calculate the decimal value, you should get -4, which is the distance of the label from the PC.
Targets for jump instructions should use the absolute location of the target. For example, assume that label L1 is located in memory at 0x400370. The instruction j L1 will resolve to j 400370.
Data labels should be referenced by their offset from the global pointer, $gp, which is assumed to point to the start of the data segment.
You should use the linprog servers for all of your compilation and testing. Your output should match mine exactly. You can determine if the results are identical by calculating the md5sum or by using diff. You must use C/C++ as your language and your solution should be a single file (e.g. ch03c.pr01.c or ch03c.pr01.cpp). You should submit this file through Blackboard. Your program should have comments inline and a header at the top. For example:
/**
* @file main.cpp
* @author hughes <>, (C) 2014, 2015, 2016
* @date 05/11/16
* @brief Simple MIPS assembler
*
* @section DESCRIPTION
* This program implements an assembler for a subset
* of the MIPS assembly language. Can compile with debug
* by including -DDEBUG in the compiler options.
************************************************************/
Please test your output against the results from the sample binary before submission. The test script uses md5 and diff to compare your output with the baseline. Your submissions will also be processed for plagiarism. The script will use the following for compilation: g++ -Werror -mtune=generic -O0 -std=c++11
If you write it in C instead of C++, the script will use gcc -Werror -mtune=generic -O0 -std=c11
You can access my binary using the following command:
~chughes/cda3101/assembler
There is an example assembly program below in Figure 1 along with the machine code. You can access the assembly source at ~chughes/cda3101/test01.s and the object code at ~chughes/cda3101/test01.obj. You should note that the machine code is in hexadecimal.
.text
addu $s0,$zero,$zero addu $s1,$zero,$zero addiu $v0,$zero,5 syscall
sw $v0,n($gp)
L1:
lw $s2,n($gp) slt $t0,$s1,$s2 beq $t0,$zero,L2
addiu $v0,$zero,5 syscall
addu $s0,$s0,$v0 addiu $s1,$s1,1 j L1
L2:
addu $a0,$s0,$zero addiu $v0,$zero,1 syscall
addiu $v0,$zero,10 syscall
.data n: .word 0
m: .word 1,9,12
q: .space 10
|
00008021
00008821
24020005
0000000c af820000 8f920000
0232402a
11000005
24020005
0000000c
02028021
26310001
08000005
02002021
24020001
0000000c
2402000a
0000000c
00000000
00000001
00000009
0000000c
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
|
Figure 1 - Sample source code (left) and object code (right)
A second test file is included in the directory and is named test02.s. These are samples and are not the inputs that will be used for grading. Feel free to write your own inputs and share them via the discussion boards. If you find an error in assembler, please let me know (extra credit)!
While you are free to use any string parsing method you choose, you may find it helpful to use the getline function. getline extracts characters from an input stream and stores them in a string until a delimiter is reached or a newline character is found.
istream& getline (istream& is, string& str);
For example, the code below discards whitespace at the current pointer, reads a line from the input, and pushes the line to a list as a string type.
do
{
std::ws(asmFile); std::getline(asmFile, lineIn);
sourceCode.push_back(lineIn); //add to the list of instructions from source
}while(asmFile.eof() == 0);
You may also find the Boost tokenizer class useful. The tokenizer will parse the input sequence and break the sequence into pieces, depending on a delimiter. The code below takes an input string, input, and seperates it based on the characters defined in delimeter. The for-loop then iterates through those tokens.
boost::char_separator delimeter(", ()");
boost::tokenizer< boost::char_separator< char > > tokens(input, delimeter);
for(boost::tokenizer< boost::char_separator >::iterator it = tokens.begin(); it != tokens.end(); it++)
{
//stuff
}
These are just some of the tools that I used in my solution; you are not required to use them! C/C++ has plenty of functions that you may find useful such as fgets and sscanf. Be creative!
I don't know how many pages it would be since it is programming and the details in the file i uploaded