So You Want to Build a Language VM - Part 14 - Symbol Tables
Adds in a symbol table to our VM
Assembler Struct
Welcome back! When we last left our intrepid readers, we were about to write an assembler struct.
But why? What does it do?
What does it all mean?! Why don’t more people love interrobangs?! Ahem.
So far, we’ve taken strings and handed them directly to our parser via the program
parser. But now we’re talking about doing things that require keeping state like multiple passes. This will serve as our simple abstraction on top of all that. In src/assembler/mod.rs
, add this:
#[derive(Debug)]
pub struct Assembler {
phase: AssemblerPhase,
}
#[derive(Debug)]
pub struct Assembler {
pub phase: AssemblerPhase,
pub symbols: SymbolTable
}
impl Assembler {
pub fn new() -> Assembler {
Assembler {
phase: AssemblerPhase::First,
symbols: SymbolTable::new()
}
}
}
Important | Don’t worry about the symbols field. We’ll take care of that soon. |
Simple assembler assembled! Note how we created an enum to track its phase. We could have used a u8 or something as well, but this leverages the Rust type system for clarity. Our Assembler is going to take over the following duties:
Passing the raw string to the parser
Constructing the symbol table
Outputting a Vec<u8> that is the final bytecode that the VM can read
Step 1: Parsing
For this, we’re going to add a few functions to our assembler:
pub fn assemble(&mut self, raw: &str) -> Option<Vec<u8>> {
match program(CompleteStr(raw)) {
Ok((_remainder, program)) => {
self.process_first_phase(&program);
Some(self.process_second_phase(&program))
},
Err(e) => {
println!("There was an error assembling the code: {:?}", e);
None
}
}
}
fn process_first_phase(&mut self, p: &Program) {
self.extract_labels(p);
self.phase = AssemblerPhase::Second;
}
fn process_second_phase(&mut self, p: &Program) -> Vec<u8> {
let mut program = vec![];
for i in &p.instructions {
let mut bytes = i.to_bytes(&self.symbols);
program.append(&mut bytes);
}
program
}
Three new functions! What riches!
What happens is:
The
assemble
function accepts a raw string referenceAssembler gives the raw text to the
program
parserIt uses a
match
statement to check that the program parsed correctlyAssuming it did parse, we feed the program through each of the assembler phases
The assembler phases are broken out into other functions to help keep it neat
The first phase extracts all the labels and builds the symbol table
It then switches the phase to second
The second phase is then called, which just calls
to_bytes
on everyAssemblerInstruction
All the bytes are added to a Vec<u8> which contains the fully assembled bytecode
Next up, let’s look at the extract_labels
function:
fn extract_labels(&mut self, p: &Program) {
let mut c = 0;
for i in &p.instructions {
if i.is_label() {
match i.label_name() {
Some(name) => {
let symbol = Symbol::new(name, SymbolType::Label, c);
self.symbols.add_symbol(symbol);
},
None => {}
};
}
c += 4;
}
}
What this function does is go through every instruction and look for label declarations. That is, places where the user has typed some_name: <opcode> …
. When it finds one, it adds it to our symbol table, along with the byte we found the label at.
Step 2: Symbols and Tables
We need to make three more data structures: the Symbol
, the SymbolType
and the SymbolTable
. Put these in src/assembler/mod.rs
. Symbol
and SymbolType
looks like:
#[derive(Debug)]
pub struct Symbol {
name: String,
offset: u32,
symbol_type: SymbolType,
}
impl Symbol {
pub fn new(name: String, symbol_type: SymbolType, offset: u32) -> Symbol {
Symbol{
name,
symbol_type,
offset
}
}
}
#[derive(Debug)]
pub enum SymbolType {
Label,
}
Later on, we’ll have more SymbolTypes. For now, we start with Label. SymbolTable
looks like:
#[derive(Debug)]
pub struct SymbolTable {
symbols: Vec<Symbol>
}
impl SymbolTable {
pub fn new() -> SymbolTable {
SymbolTable{
symbols: vec![]
}
}
pub fn add_symbol(&mut self, s: Symbol) {
self.symbols.push(s);
}
pub fn symbol_value(&self, s: &str) -> Option<u32> {
for symbol in &self.symbols {
if symbol.name == s {
return Some(symbol.offset);
}
}
None
}
}
Important | This would be better implemented as a HashTable. We’ll change it later. |
Right now we need basic functions (add, get symbol value), but don’t worry, it will grow. =)
And Yet More Tests
Ha, you thought I’d forgotten, didn’t you? No such luck!
This will take care of the symbol-related tests:
#[test]
fn test_symbol_table() {
let mut sym = SymbolTable::new();
let new_symbol = Symbol::new("test".to_string(), SymbolType::Label, 12);
sym.add_symbol(new_symbol);
assert_eq!(sym.symbols.len(), 1);
let v = sym.symbol_value("test");
assert_eq!(true, v.is_some());
let v = v.unwrap();
assert_eq!(v, 12);
let v = sym.symbol_value("does_not_exist");
assert_eq!(v.is_some(), false);
}
And for the assembler:
#[test]
fn test_assemble_program() {
let mut asm = Assembler::new();
let test_string = "load $0 #100\nload $1 #1\nload $2 #0\ntest: inc $0\nneq $0 $2\njmpe @test\nhlt";
let program = asm.assemble(test_string).unwrap();
let mut vm = VM::new();
assert_eq!(program.len(), 21);
vm.add_bytes(program);
assert_eq!(vm.program.len(), 21);
}
End
We’ll call it good for this part. Wikipedia has a good article on SymbolTables
. Don’t worry if they are confusing at first. Next up, we’ll be using clap
to make a nicer CLI interface to our VM.
If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.