So You Want to Build a Language VM - Part 26 - Adding Floating Point Support

Extends the VM to handle floating point numbers


Hi everyone! In this tutorial, we’re going to upgrade our VM to support floating point numbers. In the previous tutorial, we added support to the Palladium language, so we kind of need to support it in the VM as well. =) Before we get to implementation, let’s have a quick review of numbers. == Numbers If you already know what a floating point number is, feel free to skip this part. If not, read on for a quick review.


These are also known as whole numbers. 1, 2, 3, 4 etc. It also includes negative numbers: -1, -2, -3, etc. 0 is also an integer. If we look at those numbers in terms of money, we can see a problem…​how do you represent $1.50? You can’t, you have either $1 or $2.

Floating Points

These have decimal points in them and are used when greater precision is needed. We can represent fractions like $1.50.

CPU Representation

So why don’t we use floats everywhere? Well, we could. In fact, a few languages do. So why do most languages distinguish between them? Let’s take a look at how an integer is represented inside a computer.

In binary, every bit is 1 or 0. We group 8 bits into something called a byte. We can express integers using this without issue:

[0, 0, 0, 0, 0, 0, 0, 0]

The integer 1 is represented as [0, 0, 0, 0, 0, 0, 0, 1], and so on. This is how our decimal system works.


You may be wondering how we would represent negative integers. Good question! A common way is to use one bit as the sign bit. In our 8-bit example, we could say the left-most bit is our sign bit. If it is 0, the number is positive. If it is 1, it is negative. The integer -1 could be expressed as: [1, 0, 0, 0, 0, 0, 0, 1]. This is the difference between a signed and unsigned number you see in programming languages.

Using 8 bits with no sign bit, we can express 2^8, or 0-256. Using 8 bits with a sign bit, we can express 2^7, both positive and negative. So our range becomes -128 to 128.

If we want to represent floating point numbers in binary, we have to do more work. I am not going to go into detail on it here, as excellent explanations abound online. The summary is that it takes more work for the CPU to work with floating point numbers compared to integers.


OK, on to implementation! Iridium has 32 i32 registers. i32 means 32-bit integer. We can’t just stuff a 64-bit float in there, now can we? We have a couple options here:

  1. Duplicate the existing register design, but use f64 instead of i32. This is similar to a chip manufacturer adding more hardware registers.

  2. Change the existing registers to hold an array of u8s.

  3. Tinkering with unsafe raw pointers to cast between types.

Option 1

This has the advantage of simplicity, but it also causes a lot of code duplication. Now we need separate instructions for the 64-bit registers. And what happens if someone tries to add a 32-bit register to a 64-bit register? Do we allow casting between them?

Option 2

This has the advantage of being a more generic solution, but it also would require us to write a lot of code to manage u8s.

Option 3

One of our primary goals is to not use unsafe rust unless we cannot avoid it.

And the Winner is…​

Option 1! It is easiest to code and understand. It will make some things a bit more annoying for the user, but we can do some things in the parser to help with that.


The summary of steps we’ll need to do for this is:

  1. Add an array of f64s to the VM

  2. Add f64-bit versions of the LOAD, ADD, SUB, MUL, DIV and comparison operators

  3. Teach our parser and assembler about them

Let’s get started!

Add f64 Registers to VM

This is an easy one. In src/, add a field called float_registers.

pub struct VM {
    /// Array that simulates having hardware registers
    pub registers: [i32; 32],
    /// Array that simulates having floating point hardware registers
    pub float_registers: [f64; 32],
    /// ...

Then in the VM constructor, we need to initialize it:

impl VM {
    /// Creates and returns a new VM
    pub fn new() -> VM {
        VM {
            registers: [0; 32],
            float_registers: [0.0; 32],
    // ...

Note the fact that we are defaulting them to 0.0. This is because they are f64s.

New Instructions

We need to a bunch of new instructions:

  1. LOADF64

  2. ADDF64

  3. SUBF64

  4. MULF64

  5. DIVF64

  6. EQF64

  7. NEQF64

  8. GTF64

  9. GTEF64

  10. LTF64

  11. LTEF64

In src/, add each of those to the list of Opcodes, the From<u8> impl, and the From<CompleteStr<'a>> for Opcode trait.

But wait, there’s more!

We also have to add a match arm for each of the new instructions in src/ The implementation of the ADDF64 looks like:

Opcode::ADDF64 => {
    let register1 = self.float_registers[self.next_8_bits() as usize];
    let register2 = self.float_registers[self.next_8_bits() as usize];
    self.float_registers[self.next_8_bits() as usize] = register1 + register2;
Notice how it is the same as the ADD instruction, except that we access the float_registers array.

The One Gotcha

Let’s take a look at the EQ opcode for the 32-bit instruction:

Opcode::EQ => {
    let register1 = self.registers[self.next_8_bits() as usize];
    let register2 = self.registers[self.next_8_bits() as usize];
    self.equal_flag = register1 == register2;

== works fine for comparing integers, since they can be expressed exactly in binary. However, two floats may vary by a very small amount. For example, say we have 1.1003 and 1.1004. We may not care that one is one 10000th higher, and care only that they are equal within one tenth. This margin of error is often called is epsilon. Rust makes it available via use std::f64::EPSILON;.

Check out our EQ64 instruction:

Opcode::EQF64 => {
    let register1 = self.float_registers[self.next_8_bits() as usize];
    let register2 = self.float_registers[self.next_8_bits() as usize];
    self.equal_flag = (register1 - register2).abs() < EPSILON;
We check if the difference between the two falls within epsilon, and if so, consider it equal. All of the 64-bit comparison instructions have to utilize epsilon. Give them a shot, and check out the source code if you are having trouble.


The last thing we need to update is our assembler to teach it to recognize floats. In src/, add this:

named!(float_operand<CompleteStr, Token>,
            sign: opt!(tag!("-")) >>
            left_nums: digit >>
            tag!(".") >>
            right_nums: digit >>
                    let mut tmp = String::from("");
                    if sign.is_some() {
                    let converted = tmp.parse::<f64>().unwrap();
                    Token::Factor{ value: Box::new(Token::Float{value: converted}) }
This parser looks for:

  1. #

  2. An optional -

  3. One or more digits

  4. An .

  5. One or more digits

This let’s us parse positive and negative floats, as long as there is a .. After that, we add the new float parser to the operand parser in the same file:

named!(pub operand<CompleteStr, Token>,
        integer_operand |
        float_operand |
        label_usage |
        register |


Let’s try a more succinct way of writing a test:

fn test_parse_float_operand() {
    vec!["#100.3", "#-100.3", "#1.0", "#0.0"].iter().map(|i| {
        assert_eq!(float_operand(CompleteStr(i)).is_ok(), true);
Still reasonably clear but more compact. The same pattern can be applied to other tests as well. I’ll refactor some, but am not going to go over them in our tutorial here.

Because we have a problem. A big (or not so big) problem, depending on how you look at it.

> 2^16

With our current design of fixed-width 32-bit instructions, we lose 8 bits to the opcode. In the case of LOAD, we lose another 8 to specify the register. This leaves us only 16 bits to express a number to load into that register. So even though we have i32 and f64 registers, we can’t make full use of them. What we need is a way to let the programmer input numbers larger than 2^16.

There’s actually a lot of ways to do this. A common one is to combine bit-shifting instructions with loads.

Bit Shifting

Let’s say we want to load a 32-bit number into a 32-bit register. What we could do is have an instruction that sets the first half of the target register to the first 16-bits of the register, then an instruction that shifted those to the right 16 places.

So we go from: [0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] → [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,1,0,0,0,1,1,0,0,0,0,1,0,0,0,0]. Now we can use the same instruction to set the first 16 bits again.

A Variant on Bit Shifting

We don’t even need to shift the bits if we have an instruction to set both the first 16 and the last 16. When the assembler sees that the user wants to load a number > 16 bits, it could automatically output the two instructions necessary to set the register halves. You will often see this instruction referred to as something like LU, Load Upper, LOADU, etc. These instructions are often polite and will shift the upper bits down to the lower bits before loading them.

We’ll work on implementing these in our next tutorial.

A Quick Shift

Before we end this tutorial, let’s write a quick bit-shift instruction. The new instructions are:

  1. SHR - Shifts 16 bits to the right

  2. SHL - Shifts 16 bits to the left

I’m not going to go through the details of adding in opcodes, other than the implementation in src/ You should know those already. =)

The syntax for these commands will be: * SHR $<reg_num> * SHR $<reg_num> #<number of bits>

The second form is more aspirational, but it would be nice to allow users to choose the number of bits to shift if desired.

SHL Implementation

Below is the implementation for the SHL instruction. The remaining one you can implement, or refer to the source code, as it is nearly identical.

Opcode::SHL => {
    let reg_num = self.next_8_bits() as usize;
    let num_bits = match self.next_8_bits() {
        0 => { 16 },
        other => { other }
    self.registers[reg_num] = self.registers[reg_num].wrapping_shl(num_bits.into());
This code:

  1. Gets the register the user wants to shift

  2. Gets the next 8 bits, which is how many bits they want to shift

  3. If it is 0, it defaults to 16 bits

  4. If it is some other number, it shifts that amount

wrapping_shl is a function built-in to the integral types (u64, i32, etc).

And a Test…​

fn test_shl_opcode() {
    let mut test_vm = VM::get_test_vm();
    test_vm.program = vec![33, 0, 0, 0];
    assert_eq!(5, test_vm.registers[0]);
    assert_eq!(327680, test_vm.registers[0]);

If we print out binary representations of the zeroth register, we can see the change:

bits: 0b000000000000000000000000000101
bits: 0b000000000001010000000000000000
The first line is the number 5 in binary. If we shift left 16 bits, the number becomes 32768. Cool, huh?

Did you know that with integers, you can multiply by 2 by shifting an integer 1 bit to the left? And you can divide it by two by shifting it one bit the other way.

A Final Note on Floats

You may be wondering why there aren’t bit shift instructions for floats. This is because of how floats are represented internally. The bits are divided into two parts, the significand and the exponent. If you shift bits around, you risk changing the actual meaning of the bits by moving them to different sections.


Yay, we made it! Next up, we’ll make some instructions to load the upper or lower halves of a register.

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.