In this project, you will build an 8088 assembler. The job of your assembler is to turn 8088 assembly code (i.e., the 8086 assembly code language) into 8088 machine code that executes in an DOS emulator.
The wiki page lists links to two different DOS emulators. You have been given a paper copy of part of the 8088 data sheet, as well as a variety of links to documentation about the 8088, its ISA, its instruction set, and its execution semantics.
This kind of project is a comprehensive test of the skills and knowledge you obtain throughout the course. It is a replacement for the final exam, and should be approached with the dedication and purpose one would put into studying for a final exam. Even though it replaces the final exam as a course component, there are several due dates throughout the semester. Please be aware of them, as late work is not accepted.
This project is an opportunity for students to demonstrate understanding of a concrete, specific Instruction Set Architecture (ISA) and general CPU organization concepts and design philosophies (CISC vs. RISC), the instruction formats and addressing modes, the data path and fetch-decode-execute cycle, memory addressing, number systems and encodings, a practical and realistic knowledge of C programming, among the other topics listed in the course syllabus.
Students must demonstrate their solution by both submitting their code and demonstrating the software to the instructor.
The instructor believes that students at this point in the CPSC curriculum can and should be able to write a basic assembler using the C programming language. The intent of the project is not for students to write a NASM, AS, or GAS replacement, but rather a simple and functional assembler that does basic assembly of instruction mnemonics into byte values. The core of this assembler should contain file I/O code that tokenizes lines of text (i.e., each line of an assembly source file), maintains a symbol and label table, and then translates these tables into a sequence of bytes that represents each instruction; the sequence should be written to a file containing the binary code and be executable as a 16-bit DOS .COM executable for the x86 platform (hint: see the nasm -hf output and the description of output formats in the NASM documentation, specifically the 'bin'format -- links are on the wiki).
A significant number of early tutorials are dedicated to the C programming language, including standard libraries and functions that students will find useful in doing the file I/O and manipulation of the data representing each instruction. Likewise, students should become comfortable with writing assembly language programs in the x86 language. Students should also be able to use tools like nasm, ndisasm, objdump, and udcli to compare the output of their assembler with standard tools.
Below is a list of anticipated learning outcomes for students, including but not limited to:
- develop fluency with the C language and libraries
- reinforce programming skills from previous CPSC courses
- acquire experience writing real assembly programs
- become familiar with a simple but real computer architecture and machine, complementing the theoretical study of the SPARC and x86 ISAs from class material
- become acquainted with the semantics of a real instruction set and formats
- create an element of a professional portfolio
- develop and reinforce number systems knowledge and translation skills; bit manipulation
- clearly understand the link between constructs expressed at the ISA level and the bit values interpreted by the machine hardware
- develop a foundation of basic parsing concepts
- practice "speaking machine" through the specification of translation routines
- gain understanding of how high-level source constructs like variables and control flow artifacts are represented and interpreted at the machine level
1. This is an individual project. You may not receive help from other students or professionals. You may use any public reference material you wish, including those provided by the instructor. Evidence of collusion will be treated seriously and referred to the official University process for dealing with academic dishonesty. You may not simply re-purpose existing assembler code, or pass input assembly code to an already-existing assembler such as NASM or AS. Your assembler should be of your own design and conception.
2. We will provide an execution harness for executing your assembled program. In other words, you do not have to worry about packaging
your assembled code as an ELF file: the output file should be a flat sequence of bytes (similar to the 'nasm -f bin' output option) and named as a DOS .COM file (e.g., MYPROG.COM). The output programs should run under DOSBOX or FreeDOS.
3. Your work on this project will be assessed at three separate times:
(1) at the midpoint of the semester, as indicated by the deadline above, you must submit a set of valid 8088 assembly programs that forms your test suite. This milestone provides you with an opportunity to practice your knowledge of assembly code, machine instruction formats for a CISC architecture, and get a small portion of the project done. The test suite should contain freshly written programs (i.e., you should not simply resubmit a previous HW assignment answer as a test case). For maximum marks, you should have a significant number test cases
of _significant_ complexity. Each test program should be properly formatted and every line should be commented. The file should begin with a comment describing the rationale for including it in the test suite (i.e., it should document what feature(s) of your assembler this test case is meant to assess). You should write these test cases before you write your assembler. The instructor anticipates that your assembler may not successfully assemble all of your test cases (especially if you create some really complex test cases); that's fine -- your grade for this component is based on the quality and number of test cases, not whether your assembler correctly processes them all.
(2) about 2 weeks before the end of the semester, the submission of the basic core of the assembler code, e.g., something that parses lines of assembly mnemonics and dispatches to the appropriate translation routine for the given instruction. Basically, this codebase should reflect at least the "D" level of work (see the table below). This component is for feedback purposes only; it is not graded as such, but you WILL lose 10% of the points available for this project if you do not submit something that compiles (without warnings) and "works" at the D
level.
(3) during the last week, individual demonstrations to the instructor during tutorial sessions of the full capabilities of your assembler, plus the immediate submission of its code as demo'd (this version of the code is graded per the table below).