So You Want to Build a Language VM - Part 03 - More Basic Opcodes

Adds more opcodes for basic math

Even More Opcodes!

Right now, our VM can do one thing: halt. An important feature to be sure, but we should probably add in more opcodes to do things like, oh, load, add, multiple, etc.

LOAD

It’s about time those slacker registers start pulling their weight! In our future assembly language, the LOAD instruction will look like:

LOAD $0 #500

This will tell our VM to load the number 500 into register 0.

Breaking Up is Hard to Do

We have 32 bits to work with, and 16 are already spoken for:

  • 8 for the opcode

  • 8 for the register number

That leaves us 16 bits to use to store the number. For now we care about unsigned integers, so the largest number we can store in a LOAD instruction is 2^16, or 65,536. Later on, we’ll need a way to handle larger numbers, but for now, we’ll keep our numbers below that.

Note
Our registers can store an i32, but expressing negative numbers as u8s requires a bit of explanation we’ll go into later.

To the Code!

These are the steps we need to take to process a LOAD opcode: 1. Decode the first 8 bits and see LOAD 2. Decode the next 8 bits and use it to get the register 3. Decode the next 16 bits (split into 2 u8s) into an integer 4. Store them in the register

Opcode::LOAD => {
    let register = self.next_8_bits() as usize; // We cast to usize so we can use it as an index into the array
    let number = self.next_16_bits() as u16;
    self.registers[register] = number as i32; // Our registers are i32s, so we need to cast it. We'll cover that later.
    continue; // Start another iteration of the loop. The next 8 bits waiting to be read should be an opcode.
},

You’ll note the existence of some extra helper functions, next_8_bits and next_16_bits. You can add these to the VM impl block like so:

fn next_8_bits(&mut self) -> u8 {
    let result = self.program[self.pc];
    self.pc += 1;
    return result;
}

fn next_16_bits(&mut self) -> u16 {
    let result = ((self.program[self.pc] as u16) << 8) | self.program[self.pc + 1] as u16;
    self.pc += 2;
    return result;
}

These are convenience functions to get the next 8 or 16 bits and increment the program counter.

And that’s all there is to the LOAD opcode! Let’s write a test for it:

#[test]
fn test_load_opcode() {
  let mut test_vm = get_test_vm();
  test_vm.program = vec![0, 0, 1, 244]; // Remember, this is how we represent 500 using two u8s in little endian format
  test_vm.run();
  assert_eq!(test_vm.registers[0], 500);
}

A Brief Detour

Calling run() in our tests is fragile and pretty hacky. Let’s add in a function to let us execute one instruction. We could just copy and paste the whole run function and tweak it, but ewww. We can factor out the execution of the opcodes into a function:

/// Loops as long as instructions can be executed.
pub fn run(&mut self) {
    let mut is_done = false;
    while !is_done {
        is_done = self.execute_instruction();
    }
}

/// Executes one instruction. Meant to allow for more controlled execution of the VM
pub fn run_once(&mut self) {
    self.execute_instruction();
}


fn execute_instruction(&mut self) -> bool {
    if self.pc >= self.program.len() {
        return false;
    }
    match self.decode_opcode() {
        Opcode::LOAD => {
            let register = self.next_8_bits() as usize;
            let number = self.next_16_bits() as u32;
            self.registers[register] = number as i32;
        },
        Opcode::HLT => {
            println!("HLT encountered");
            false
        },
    }
    true
}
Note
This adds in an additional function call to every iteration of the VM. In terms of performance, this isn’t good. When we start benchmarking, we’ll want to revisit this.

ADD

Now let’s code the ADD instruction. It has the following form: ADD $0 $1 $2. The first two operands are the registers whose values we want to add, and the third register is where the value will end up. In our assembly language, if we wanted to load two numbers and add them, it might look like:

LOAD $0 #10
LOAD $1 #15
ADD $0 $1 $2

Register 2 would have the value 25. ADD uses all 4 bytes of an instruction, so our code for it looks like:

Opcode::ADD => {
    let register1 = self.registers[self.next_8_bits() as usize];
    let register2 = self.registers[self.next_8_bits() as usize];
    self.registers[self.next_8_bits() as usize] = register1 + register2;
},

SUB, MUL and DIV

The SUB and MUL opcodes are identical to ADD, but the division is not. Because our registers hold u32 values, we cannot store a decimal result from division. Thus, we need to store the remainder. You may remember this from school: 8 / 5 = 1 remainder 3.

To deal with this need, we’re going to add another attribute to our VM, called remainder. Our new VM struct looks like this:

#[derive(Debug)]
pub struct VM {
    registers: [i32; 32],
    pc: usize,
    program: Vec<u8>,
    remainder: u32,
}

DIV

When we come across a DIV opcode, what we want to do is divide it, store the quotient in the register, and the remainder in the remainder attribute of the VM. This makes our code for the DIV match arm a bit different:

Opcode::DIV => {
    let register1 = self.registers[self.next_8_bits() as usize];
    let register2 = self.registers[self.next_8_bits() as usize];
    self.registers[self.next_8_bits() as usize] = register1 / register2;
    self.remainder = (register1 % register2) as u32;
},
Note
You may be wondering what the % is. That is the modulo operator in Rust, and is how we get the remainder.

Now What?

We have a VM that can do some math! What more do you want?!

Well, ok, we should probably add some more features in. In our next post, we’ll talk about jumping!


If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.