So You Want to Build a Language VM - Part 14 - Symbol Tables

Adds in a symbol table to our VM

Assembler Struct

Welcome back! When we last left our intrepid readers, we were about to write an assembler struct.

But why? What does it do?

What does it all mean?! Why don’t more people love interrobangs?! Ahem.

So far, we’ve taken strings and handed them directly to our parser via the program parser. But now we’re talking about doing things that require keeping state like multiple passes. This will serve as our simple abstraction on top of all that. In src/assembler/, add this:

pub struct Assembler {
    phase: AssemblerPhase,

pub struct Assembler {
    pub phase: AssemblerPhase,
    pub symbols: SymbolTable

impl Assembler {
    pub fn new() -> Assembler {
        Assembler {
            phase: AssemblerPhase::First,
            symbols: SymbolTable::new()
Don’t worry about the symbols field. We’ll take care of that soon.

Simple assembler assembled! Note how we created an enum to track its phase. We could have used a u8 or something as well, but this leverages the Rust type system for clarity. Our Assembler is going to take over the following duties:

  1. Passing the raw string to the parser

  2. Constructing the symbol table

  3. Outputting a Vec<u8> that is the final bytecode that the VM can read

Step 1: Parsing

For this, we’re going to add a few functions to our assembler:

pub fn assemble(&mut self, raw: &str) -> Option<Vec<u8>> {
    match program(CompleteStr(raw)) {
        Ok((_remainder, program)) => {
        Err(e) => {
            println!("There was an error assembling the code: {:?}", e);

fn process_first_phase(&mut self, p: &Program) {
    self.phase = AssemblerPhase::Second;

fn process_second_phase(&mut self, p: &Program) -> Vec<u8> {
    let mut program = vec![];
    for i in &p.instructions {
        let mut bytes = i.to_bytes(&self.symbols);
        program.append(&mut bytes);

Three new functions! What riches!

What happens is:

  1. The assemble function accepts a raw string reference

  2. Assembler gives the raw text to the program parser

  3. It uses a match statement to check that the program parsed correctly

  4. Assuming it did parse, we feed the program through each of the assembler phases

  5. The assembler phases are broken out into other functions to help keep it neat

  6. The first phase extracts all the labels and builds the symbol table

  7. It then switches the phase to second

  8. The second phase is then called, which just calls to_bytes on every AssemblerInstruction

  9. All the bytes are added to a Vec<u8> which contains the fully assembled bytecode

Next up, let’s look at the extract_labels function:

fn extract_labels(&mut self, p: &Program) {
    let mut c = 0;
    for i in &p.instructions {
        if i.is_label() {
            match i.label_name() {
                Some(name) => {
                    let symbol = Symbol::new(name, SymbolType::Label, c);
                None => {}
        c += 4;

What this function does is go through every instruction and look for label declarations. That is, places where the user has typed some_name: <opcode> …​. When it finds one, it adds it to our symbol table, along with the byte we found the label at.

Step 2: Symbols and Tables

We need to make three more data structures: the Symbol, the SymbolType and the SymbolTable. Put these in src/assembler/ Symbol and SymbolType looks like:

pub struct Symbol {
    name: String,
    offset: u32,
    symbol_type: SymbolType,

impl Symbol {
    pub fn new(name: String, symbol_type: SymbolType, offset: u32) -> Symbol {

pub enum SymbolType {

Later on, we’ll have more SymbolTypes. For now, we start with Label. SymbolTable looks like:

pub struct SymbolTable {
    symbols: Vec<Symbol>

impl SymbolTable {
    pub fn new() -> SymbolTable {
            symbols: vec![]

    pub fn add_symbol(&mut self, s: Symbol) {

    pub fn symbol_value(&self, s: &str) -> Option<u32> {
        for symbol in &self.symbols {
            if == s {
                return Some(symbol.offset);
This would be better implemented as a HashTable. We’ll change it later.

Right now we need basic functions (add, get symbol value), but don’t worry, it will grow. =)

And Yet More Tests

Ha, you thought I’d forgotten, didn’t you? No such luck!

This will take care of the symbol-related tests:

fn test_symbol_table() {
    let mut sym = SymbolTable::new();
    let new_symbol = Symbol::new("test".to_string(), SymbolType::Label, 12);
    assert_eq!(sym.symbols.len(), 1);
    let v = sym.symbol_value("test");
    assert_eq!(true, v.is_some());
    let v = v.unwrap();
    assert_eq!(v, 12);
    let v = sym.symbol_value("does_not_exist");
    assert_eq!(v.is_some(), false);

And for the assembler:

fn test_assemble_program() {
    let mut asm = Assembler::new();
    let test_string = "load $0 #100\nload $1 #1\nload $2 #0\ntest: inc $0\nneq $0 $2\njmpe @test\nhlt";
    let program = asm.assemble(test_string).unwrap();
    let mut vm = VM::new();
    assert_eq!(program.len(), 21);
    assert_eq!(vm.program.len(), 21);


We’ll call it good for this part. Wikipedia has a good article on SymbolTables. Don’t worry if they are confusing at first. Next up, we’ll be using clap to make a nicer CLI interface to our VM.

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.