So You Want to Build a Language VM - Part 02 - Basic Opcodes

Instructions and decoding opcodes

July 20, 2018

Opcodes

When we last left our intrepid reader (that’s you), they had gotten to the point of writing an opcode. Let’s pick up from there. === What is an Opcode? An integer between 0 and some upper bound. Because we are using 8 bits to represent an opcode, we could have 255 of them. To represent them in code, we’ll be using an enum, because Rust enums are the bee’s knees. In your source directory, make a new file called instruction.rs.

Enum

In instruction.rs, put:

#[derive(Debug, PartialEq)]
pub enum Opcode {
  HLT,
  IGL
}

Instructions

Remember how our instructions are 32 bits? Let’s make a struct to represent an entire instruction:

#[derive(Debug, PartialEq)]
pub struct Instruction {
  opcode: Opcode
}

We’ll add some more fields later, but for now, this will do. We’ll also need to add in Ye Olde Impl Block:

impl Instruction {
  pub fn new(opcode: Opcode) -> Instruction {
    Instruction {
      opcode: opcode
    }
  }
}

And Yet More Tests!

In instruction.rs, if you haven’t already, add a test module:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_create_hlt() {
        let opcode = Opcode::HLT;
        assert_eq!(opcode, Opcode::HLT);
    }

    #[test]
    fn test_create_instruction() {
      let instruction = Instruction::new(Opcode::HLT);
      assert_eq!(instruction.opcode, Opcode::HLT);
    }
}

Yes, these tests are a bit simplistic, but I like to get into the habit of writing tests as early as possible in a project.

Important

Don’t forget to add pub mod instruction to the main.rs file if you haven’t!

Execution Loop

We have an opcode and an instruction and a VM. Our next critical component is the function that will execute an opcode. Head over to vm.rs, where we are going to make some changes.

First, we’ll need to add a vector to the VM struct to store our program bytecode:

#[derive(Debug)]
pub struct VM {
    registers: [i32; 32],
    pc: usize,
    program: Vec<u8>
}

Note	No, we aren’t storing a vector of Instructions, but rather a vector of bytes. The Instruction struct will be useful when we write our assembler later on.

Second, our Impl block needs to change:

impl VM {
    pub fn new() -> VM {
        VM {
            registers: [0; 32],
            program: vec![],
            pc: 0,
        }
    }
}

You’ll notice we’ve also added a field named pc. This is our program counter, and will track which byte is executing.

The Loop!

It’s time! We can make our VM actually do something! I mean, it will stop, but still! Let’s add a function to our VM impl:

pub fn run(&mut self) {
    loop {
        // If our program counter has exceeded the length of the program itself, something has
        // gone awry
        if self.pc >= self.program.len() {
            break;
        }
        match self.decode_opcode() {
            Opcode::HLT => {
                println!("HLT encountered");
                return;
            },
            _ => {
              println!("Unrecognized opcode found! Terminating!");
              return;
            }
        }
    }
}

Important

The main execution loop is often considered the most performance-critical part of a language interpreter. This is a rather naive implementation that is not optimized. We’ll be doing a lot of testing and work on this later. It involves topics like CPU branch prediction that merit their own article.

The code above won’t actually work yet. We’re storing the entire program as a vector of bytes; our VM has no way to know what number the HLT opcode is 0. See the call to a function named decode_opcode? That’s what will take a u8 and turn it into an opcode. It looks like:

fn decode_opcode(&mut self) -> Opcode {
    let opcode = Opcode::from(self.program[self.pc]);
    self.pc += 1;
    return opcode;
}

Add that to our VM’s impl. Notice how we use Opcode::from? That’s a Rust Trait. We need to tell our program how to convert from a byte to a specific opcode, which we can do by implementing this trait for our enum. In instruction.rs put this in:

impl From<u8> for Opcode {
    fn from(v: u8) -> Self {
        match v {
            0 => return Opcode::HLT,
            _ => return Opcode::IGL
        }
    }
}

We actually define two match arms: one for HLT, and a default for all other numbers. If the VM ever encounters a number we didn’t plan to be an opcode, it will return the IGL (short for Illegal) opcode, and the VM will stop with an error.

The last thing to note is that in the decode_opcode function, we’re incrementing the pc by 1. We do this because once we have decoded the opcode, we want to move the counter to the next byte.

To wrap up this section, guess what we should add? That’s right, tests! From the previous section, we had a basic test, which I’ll re-paste below, and we’ll add two more:

#[cfg(test)] class="k">mod tests { use super::*; #[test] fn test_create_vm() { let test_vm = VM::new(); assert_eq!(test_vm.registers[0], 0) } #[test] fn test_opcode_hlt() { let mut test_vm = VM::new(); let test_bytes = vec![0,0,0,0]; test_vm.program = test_bytes; test_vm.run(); assert_eq!(test_vm.pc, 1); } #[test] fn test_opcode_igl() { let mut test_vm = VM::new(); let test_bytes = vec![200,0,0,0]; test_vm.program = test_bytes; test_vm.run(); assert_eq!(test_vm.pc, 1); } class="p">}

For this test, we can manually create a vector of 4 bytes and run the loop, and check if pc incremented. Later on, we’ll want to add a function to allow for executing one iteration to prevent a failing test from looping infinitely.

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.