So You Want to Build a Language VM - Part 18 - PIDs

Adding PIDs to the VM

Intro

Hey everyone! In this tutorial, we’ll add PID tracking to the Iridium VM. Please ensure you are starting from https://gitlab.com/subnetzero/iridium/tags/0.0.17. == PIDs There are two components we need to unique identify:

  1. Iridium VMs

  2. Processes run by those VMs

This does bring up a more fundamental question, though. Is a VM long-lived, or short-lived? Should we create a VM, with its own registers and heap, for every application we want to run? Should we create a pool of VMs, each in their own threads, waiting to run any application we care to load?

Warning
There are security considerations with re-using VMs. If we do that, we have to make sure we zero out the registers and heap before allowing another application access to it. Otherwise, applications could read data from previous VMs.

Identifiers for Iridium VMs

For this, we’re going to use a random UUID. On creation, a VM will generate a random identifier for itself. This will work regardless of what we end up doing about handling multiple VMs.

For generating a random UUID, this crate is quite handy: https://github.com/uuid-rs/uuid. Add it to your Cargo.toml, and don’t forget extern crate uuid; in main.rs.

Note
Because the UUIDs will be random ones, we need to enable the v4 feature: uuid = { version = "0.7", features = ["v4"] }.

Now over in src/vm.rs, let’s add a field to our VM:

/// Virtual machine struct that will execute bytecode
#[derive(Default, Clone)]
pub struct VM {
    /// Array that simulates having hardware registers
    pub registers: [i32; 32],
    /// Program counter that tracks which byte is being executed
    pc: usize,
    /// The bytecode of the program being run
    pub program: Vec<u8>,
    /// Used for heap memory
    heap: Vec<u8>,
    /// Contains the remainder of modulo division ops
    remainder: usize,
    /// Contains the result of the last comparison operation
    equal_flag: bool,
    /// Contains the read-only section data
    ro_data: Vec<u8>,
    /// Is a unique, randomly generated UUID for identifying this VM
    id: Uuid,
}

And our builder function in impl VM:

/// Creates and returns a new VM
pub fn new() -> VM {
    VM {
        registers: [0; 32],
        program: vec![],
        ro_data: vec![],
        heap: vec![],
        pc: 0,
        remainder: 0,
        equal_flag: false,
        id: Uuid::new_v4()
    }
}

(I’m not going to write a test for it, since it is impossible to fail)

Processes

What we want is really just an event log: "Application X was run at <timestamp> and terminated at <timestamp> with an exit code of <code>".

In theory, it is possible that a long running VM could re-use IDs, which could be confusing. Let’s give each application a random UUID as well.

Head back to vm.rs and add in this:

use chrono::prelude::*;

#[derive(Clone, Debug)]
pub enum VMEventType {
    Start,
    GracefulStop,
    Crash
}

#[derive(Clone, Debug)]
pub struct VMEvent {
    event: VMEventType,
    at: DateTime<Utc>
}

Note the use of the chrono package: https://github.com/chronotope/chrono. This is so we can easily use dates and times.

Note
And yes, all times are going to be in UTC. I am scowling right now at everyone who uses timezones in logs.

Add the chrono package to your Cargo.toml and all the rest.

Tracking Events

For now, we’ll give the VM a list of VMEvents that we’ll keep appending to.

/// Virtual machine struct that will execute bytecode
#[derive(Default, Clone)]
pub struct VM {
    // I'm removing the other fields as we have already seen them
    events: Vec<VMEvent>
}

and…​

pub fn new() -> VM {
    VM {
        // I'm removing the other fields as we have already seen them
        events: Vec::new()
    }
}

Almost There!

Let’s modify the VM to add an event when the run() function starts, stops, or crashes:

/// Wraps execution in a loop so it will continue to run until done or there is an error
/// executing instructions.
pub fn run(&mut self) -> u32 {
    self.events.push(VMEvent{event: VMEventType::Start, at: Utc::now()});
    // TODO: Should setup custom errors here
    if !self.verify_header() {
        self.events.push(VMEvent{event: VMEventType::Crash, at: Utc::now()});
        println!("Header was incorrect");
        return 1;
    }
    // If the header is valid, we need to change the PC to be at bit 65.
    self.pc = 64;
    let mut is_done = false;
    while !is_done {
        is_done = self.execute_instruction();
    }
    self.events.push(VMEvent{event: VMEventType::Stop, at: Utc::now()});
    0
}

Note that we are assuming the application terminated gracefully as long as the while loop ends. This is because execute_instruction returns a bool, not an integer. Sigh.

Let’s change it. It will be a little painful, but it will be much more painful later.

First, we have to change the return value:

fn execute_instruction(&mut self) -> u32

Then in the check if the pc has exceeded the program length:

if self.pc >= self.program.len() {
    return 1;
}

For the HLT and IGL codes:

Opcode::HLT => {
    println!("HLT encountered");
    return 0;
}
Opcode::IGL => {
    println!("Illegal instruction encountered");
    return 1;
}

and the very last line, where we returned false when an opcode returned or the application is done:

fn execute_instruction(&mut self) -> u32 {
    if self.pc >= self.program.len() {
        return 1;
    }
    match self.decode_opcode() {
        Opcode::LOAD => {
            let register = self.next_8_bits() as usize;
            let number = u32::from(self.next_16_bits());
            self.registers[register] = number as i32;
        }
        // <snip a lot of other opcodes
    };
    0
}

And now we go to change the run function yet again:

pub fn run(&mut self) -> u32 {
    self.events.push(VMEvent{event: VMEventType::Start, at: Utc::now()});
    // TODO: Should setup custom errors here
    if !self.verify_header() {
        self.events.push(VMEvent{event: VMEventType::Crash{code: 1}, at: Utc::now()});
        println!("Header was incorrect");
        return 1;
    }
    // If the header is valid, we need to change the PC to be at bit 65.
    self.pc = 64;
    let mut is_done = 0;
    while is_done == 0 {
        is_done = self.execute_instruction();
    }
    self.events.push(VMEvent{event: VMEventType::GracefulStop{code: is_done}, at: Utc::now()});
    0
}

Crap. The problem is that we are retreating a return code of 0 as a signal that the application is done, but right now, some instructions (e.g., HLT), return a 0. So the program will continue, even when it shouldn’t.

Does this mean that HLT should return something > 0? To be honest, I don’t know. I do know I don’t want to break from the *nix convention of 0 == ok, and > 0 is an error of some sort…​

Oh, hrm, Rust has the wonderful Option<_>…​hehe…​option. Let’s try using an Option with nothing in it as the signal to keep executing.

Note
I’m writing this as I write the code, so you can see my thought process.

Let’s try this as the run function in vm.rs:

/// Wraps execution in a loop so it will continue to run until done or there is an error
/// executing instructions.
pub fn run(&mut self) -> u32 {
    self.events.push(VMEvent{event: VMEventType::Start, at: Utc::now()});
    // TODO: Should setup custom errors here
    if !self.verify_header() {
        self.events.push(VMEvent{event: VMEventType::Crash{code: 1}, at: Utc::now()});
        println!("Header was incorrect");
        return 1;
    }
    // If the header is valid, we need to change the PC to be at bit 65.
    self.pc = 64;
    let mut is_done = None;
    while is_done.is_none() {
        is_done = self.execute_instruction();
    }
    self.events.push(VMEvent{event: VMEventType::GracefulStop{code: is_done.unwrap()}, at: Utc::now()});
    0
}
Note we have to unwrap is_done in adding the stop event.

And then in the execute_instruction function:

fn execute_instruction(&mut self) -> Option<u32> {
    if self.pc >= self.program.len() {
        return Some(1);
    }
Note the change of return type of the signature as well, and don’t forget to fix the HLT and IGL opcodes.

And at last, the end of our run function:

pub fn run(&mut self) -> u32 {
    // <snip>
    None
}

Run cargo test to make sure we didn’t break anything…​

test result: ok. 44 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Yay!

Application ID

For now, let’s just use a new VM per application run. This makes the VM ID the same as the Application ID. We may want to think about building a slightly more abstract form of a program, so we can attach additional information to it.

Update VMEvent

Let’s update VMEvent to have an id field:

#[derive(Clone, Debug)]
pub struct VMEvent {
    event: VMEventType,
    at: DateTime<Utc>,
    application_id: Uuid
}

And then in each of the three places we generate an event, clone it from the VM id. Our run function should look like this now:

/// Wraps execution in a loop so it will continue to run until done or there is an error
/// executing instructions.
pub fn run(&mut self) -> u32 {
    self.events.push(
        VMEvent{
            event: VMEventType::Start,
            at: Utc::now(),
            application_id: self.id.clone()
        }
    );
    // TODO: Should setup custom errors here
    if !self.verify_header() {
        self.events.push(
            VMEvent{
                event: VMEventType::Crash{
                    code: 1
                },
                at: Utc::now(),
                application_id: self.id.clone()
            }
        );
        println!("Header was incorrect");
        return 1;
    }
    // If the header is valid, we need to change the PC to be at bit 65.
    self.pc = 64;
    let mut is_done = None;
    while is_done.is_none() {
        is_done = self.execute_instruction();
    }
    self.events.push(
        VMEvent{
            event: VMEventType::GracefulStop{
                code: is_done.unwrap()},
                at: Utc::now(),
                application_id: self.id.clone()
        }
    );
    0
}

And…​damnit! We’re returning 1 or 0 from the run function still. So our nice collection of events vanish.

Sigh. OK, let’s change the run function to return a list of our events, and we’ll change the 1 and 0 returns to return our entire Vector of events. The final run function should look like:

pub fn run(&mut self) -> Vec<VMEvent> {
    self.events.push(
        VMEvent{
            event: VMEventType::Start,
            at: Utc::now(),
            application_id: self.id.clone()
        }
    );
    // TODO: Should setup custom errors here
    if !self.verify_header() {
        self.events.push(
            VMEvent{
                event: VMEventType::Crash{
                    code: 1
                },
                at: Utc::now(),
                application_id: self.id.clone()
            }
        );
        println!("Header was incorrect");
        return self.events.clone();
    }
    // If the header is valid, we need to change the PC to be at bit 65.
    self.pc = 64;
    let mut is_done = None;
    while is_done.is_none() {
        is_done = self.execute_instruction();
    }
    self.events.push(
        VMEvent{
            event: VMEventType::GracefulStop{
                code: is_done.unwrap()},
                at: Utc::now(),
                application_id: self.id.clone()
        }
    );
    self.events.clone()
}

cargo test and:

error[E0308]: mismatched types
  --> src/scheduler/mod.rs:21:7
   |
20 |       pub fn get_thread(&mut self, mut vm: VM) -> thread::JoinHandle<u32> {
   |                                                   ----------------------- expected `std::thread::JoinHandle<u32>` because of return type
21 | /       thread::spawn(move || {
22 | |           vm.run()
23 | |       })
   | |________^ expected u32, found struct `std::vec::Vec`
   |
   = note: expected type `std::thread::JoinHandle<u32>`
              found type `std::thread::JoinHandle<std::vec::Vec<vm::VMEvent>>`

Fine, compiler. Off we go to src/scheduler/mod.rs. Add an import:

use vm::{VM, VMEvent};

And change the signature of get_thread:

/// Takes a VM and runs it in a background thread
pub fn get_thread(&mut self, mut vm: VM) -> thread::JoinHandle<Vec<VMEvent>> {
  thread::spawn(move || {
      vm.run()
  })
}

cargo test says everything is fine, the compiler isn’t yelling at us…​are we done?

Ha. No, of course not! We still aren’t displaying the results to the users.

Hackety Hack

For now, we’re just going to print out the event log when we call run. We have to do this in two places:

  1. When the user runs a program from the CLI, e.g., iridium myfile.iasm

  2. When the user runs a program via the REPL

We’ll format it later so that it looks nicer, but this post is already at 2033 words.

Let’s tackle them in sequence.

CLI

In main.rs, we have this section:

let program = asm.assemble(&program);
match program {
    Ok(p) => {
        vm.add_bytes(p);
        vm.run();
        std::process::exit(0);
    },
    Err(_e) => {

    }
}

Let’s assign the out of run to a variable, and then debug print it:

match program {
    Ok(p) => {
        vm.add_bytes(p);
        let events = vm.run();
        println!("VM Events");
        println!("--------------------------");
        for event in &events {
            println!("{:#?}", event);
        };
        std::process::exit(0);
    },
    Err(_e) => {

    }

REPL

I’ll leave getting it to display in the REPL to you. You can see mine in GitLab.

End

We’ll end here for this one, though I want to make one observation.

Coding Style

My coding style in Rust is oddly freeform for such a strict language. When writing Rust code, my goal in life becomes to appease the compiler. As long as I can do that, what I code usually works like I think it will.

See you next tutorial!


If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.