So You Want to Build a Language VM - Part 20 - Benchmarks

Adds in Criterion to do benchmarks


We’ve been having so much fun, we haven’t written any benchmarks! Though it isn’t the most exciting thing to write, they are important. == Benchmarks There’s two things to understand about benchmarks:

  1. A single benchmark with no context is not helpful

  2. They are most useful when tracked over time

Any thing you can benchmark, you can tune to be good at that specific benchmark.

Useful Benchmarks

I find benchmarks most helpful when:

  1. You collect them from many sub-components of a large system or application

  2. You keep a historical record, so you can see trends

If we are building a system that consists of multiple sub-components like:

  • Web frontend

  • Backend

  • Database

  • Load Balancers

Useful benchmarks for this would be things like:

  • How many concurrent connections can the load balancers sustain on server type X? I don’t really care how fast it can factor prime numbers.

  • How effective is the caching layer of the database server? I don’t care that its RAM timings are 3ns faster.

Anyway, enough rambling. Suffice to say, we should benchmark small components in isolation and track their performance over time.

Adding in Criterion

Rust has a benchmarking system, but it isn’t available on stable. The go-to crate for benchmarking is Criterion, so let’s get going. =)


Add the following to your Cargo.toml:

name = "iridium"
harness = false


Next up, make a new directory at benches/. That’s right, same level as src/. This file is where we will put our benchmark functions.

In benches/, create with the following contents:

extern crate criterion;
extern crate iridium;

use criterion::Criterion;
use iridium::vm::VM;
use iridium::assembler::{PIE_HEADER_PREFIX, PIE_HEADER_LENGTH};

Let’s start with a simple benchmark of the VM executing arithmetic instructions. Add this next in the file:

mod arithmetic {
    use super::*;

    fn execute_add(c: &mut Criterion) {
        let clos = {
            let mut test_vm = get_test_vm();
            test_vm.program = vec![1, 0, 1, 2];

            move |b| b.iter(|| clos

As you can see, it’s almost identical to our test for that:

fn test_add_opcode() {
    let mut test_vm = get_test_vm();
    test_vm.program = vec![1, 0, 1, 2];
    test_vm.program = prepend_header(test_vm.program);;
    assert_eq!(test_vm.registers[2], 15);

I did not know that we could bind a series of expressions to a variable! That’s so cool. All we do is replicate the tests for each of the 4 arithmetic instructions and add them in.

Missing Functions

You may notice we don’t have the get_test_vm function, nor do we have the prepend_header function. Since those have become useful outside of the tests in, let’s move them up to public functions.

Head over to src/ We have a bit of re-arranging to do.

First, let’s move the prepend_header and get_test_vm functions up to the VM trait and make them public. This way, we can call them like: VM::prepend_header(…​). This will break a few things:

  1. Change use assembler::PIE_HEADER_PREFIX; to use assembler::{PIE_HEADER_PREFIX, PIE_HEADER_LENGTH};

  2. In every test that uses prepend_header or get_test_vm, preface the function call with VM::. I did this with a search and replace, but you can use the code in the repo if you prefer. =)

Now back in benches/, we can do:

mod arithmetic {
    use super::*;

    fn execute_add(c: &mut Criterion) {
        let clos = {
            let mut test_vm = VM::get_test_vm();
            test_vm.program = vec![1, 0, 1, 2];

            move |b| b.iter(|| clos

Running the Benches

Almost there! Now we need to use the criterion macros to setup our benchmark groups. At the end of the arithmetic module, put:

    name = arithmetic;
    config = Criterion::default();
    targets = execute_add, execute_sub, execute_mul, execute_div,

Yes, that is supposed to be { and }, not ( and ) in the macro. As the final line in the put:


And that’s it! Add benchmark functions (or look in the repo) for each of the arithmetic operators. From the root of the iridium/ directory, we can now run cargo bench and we should see this:

<snip a lot of stuff before this we don't care about>

Running target/release/deps/iridium-b5264c6303e130cb

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

Running target/release/deps/iridium-59eea43f05a081ed
execute_add             time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                   change: [-35.123% +1503.3% +5337.3%] (p = 0.39 > 0.05)
                   No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
4 (4.00%) high mild
9 (9.00%) high severe

execute_sub             time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                   change: [-54.712% -12.150% +78.688%] (p = 0.73 > 0.05)
                   No change in performance detected.
Found 13 outliers among 100 measurements (13.00%)
5 (5.00%) high mild
8 (8.00%) high severe

execute_mul             time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                   change: [-50.926% -1.5089% +101.21%] (p = 0.97 > 0.05)
                   No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
4 (4.00%) high mild
8 (8.00%) high severe

execute_div             time:   [0.0000 ps 0.0000 ps 0.0000 ps]
                   change: [-48.559% -5.5472% +73.134%] (p = 0.87 > 0.05)
                   No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
3 (3.00%) high mild
8 (8.00%) high severe

If you look in targets/criterion/ you will see nice graphs that criterion output.

Keeping the Benchmark Graphs

If we want to keep these graphs so we can compare runs over time, we need to put them somewhere. Since we use Appveyor, Gitlab, and Travis to build Iridium for multiple platforms, and each of those platforms will have benchmarks run, we need to keep them. The easiest thing to do is to specify that they are artifacts and keep them from each build, along with the binary, for each platform.

I won’t go through all that here, but you can check out the .travis.yml, .gitlab-ci.yml and appveyor.yml to see how I did it.


I think that is a good stopping point. There’s TONS more benchmarks we need to write, and we’ll add more in as we go.

See you next time!

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.