So You Want to Build a Language VM - Part 21 - Header Offset

Adds in calculating the starting PC location based on how much read-only data there is

Intro

Hello! In this article, we’re going to fix a teensy bug. =) == The Problem An Iridium program (bytecode) always starts with a 64 byte header. The first 4 bytes are a constant magic number that let’s the Iridium VM know that it’s Iridium bytecode. The rest are zeros. An example would be:

[45, 50, 49, 45, ... 0]

Then after that comes the read-only section of the program that contains things like constants. We do not know how long this section is until we have assembled the program. If someone declares a constant for the string "Hello", the read-only section would look like:

[72, 101, 108, 108, 111, 0]

In our program, we now have 64 (the header length) + 6 (the constant Hello, and remember strings are null-terminated) bytes for a total of 70 bytes. The VM needs to start executing at byte 71 of the program.

Fixit Fixit

Worry not, it is pretty easy to fix this! We need to:

  1. Calculate the final length of the read-only section during assembly

  2. Write that to the header using the next 4 bytes

  3. Initialize our VM with that PC

Calculating the Length

Head over to src/assembler/mod.rs and let’s modify a function. We’re going to use the byteorder crate’s nifty features for this.

The function we’re going to modify is write_pie_header. Change it to be:

fn write_pie_header(&self) -> Vec<u8> {
    let mut header = vec![];
    for byte in &PIE_HEADER_PREFIX {
        header.push(byte.clone());
    }

    // Now we need to calculate the starting offset so that the VM knows where the RO section ends

    //First we declare an empty vector for byteorder to write to
    let mut wtr: Vec<u8> = vec![];

    // Write the length of the read-only section to the vector and convert it to a u32
    // This is important because byteorder crate will pad with zeros as needed
    wtr.write_u32::<LittleEndian>(self.ro.len() as u32).unwrap();

    // Append those 4 bytes to the header directly after the first four bytes
    header.append(&mut wtr);

    // Now pad the rest of the bytecode header
    while header.len() < PIE_HEADER_LENGTH {
        header.push(0 as u8);
    }

    header
}

The three new lines in the middle are the key; I’ve added comments to each one explaining what it does.

Don’t forget to add:

use byteorder::{LittleEndian, WriteBytesExt};
to your src/assembler/mod.rs file.

One More Thing

We need to make sure we write the header after we’ve setup all the read-only data. In src/assembler/mod.rs, in the function assemble, move the call to write_pie_header to just after the body is generated, like this:

let mut body = self.process_second_phase(&program);

// Get the header so we can smush it into the bytecode letter
let mut assembled_program = self.write_pie_header();

// Merge the header with the populated body vector
assembled_program.append(&mut body);

Reading the Offset

Now we need to teach our VM how to read the offset. In src/vm.rs, add the following function:

fn get_starting_offset(&self) -> usize {
    // We only want to read the slice containing the 4 bytes right after the magic number
    let mut rdr = Cursor::new(&self.program[4..8]);
    // Read it as a u32, cast as a usize (since the VM's PC attribute is a usize), and return it
    rdr.read_u32::<LittleEndian>().unwrap() as usize
}

and then in the run function of the VM, replace:

self.pc = 64;
to:
self.pc = 64 + self.get_starting_offset();

Tests

Now let’s write a test to make sure it works! In src/assembler/mod.rs, add this test:

#[test]
/// Simple test of data that goes into the read only section
fn test_code_start_offset_written() {
    let mut asm = Assembler::new();
    let test_string = ".data\ntest1: .asciiz 'Hello'\n.code\nload $0 #100\nload $1 #1\nload $2 #0\ntest: inc $0\nneq $0 $2\njmpe @test\nhlt";
    let program = asm.assemble(test_string);
    assert_eq!(program.is_ok(), true);
    assert_eq!(program[4], 6);
}

With that test string, we should have a header that looks like:

[45, 50, 49, 45, 6, 0, 0, 0, ... ]

If we run our test, we should see:

$ cargo test test_code_start_offset_written -- --nocapture
    Finished dev [unoptimized + debuginfo] target(s) in 0.11s
     Running target/debug/deps/iridium-981657ef3cdcfc6e

running 1 test
test assembler::tests::test_code_start_offset_written ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 45 filtered out

     Running target/debug/deps/iridium-87ed8e3d062c1031

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

   Doc-tests iridium

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Yay, it works!

End

That wasn’t so bad, was it? You can see the code here: https://gitlab.com/subnetzero/iridium/tags/0.0.21.

Until next time!


If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.