So You Want to Build a Language VM - Part 25 - Extending Palladium: Part 1

Extends Palladium to handle more complex arithmetic expressions

Intro

Hey everyone! For this tutorial, we’re going to switch tracks and work on Palladium a bit. Right now, it handles simple arithmetic expressions, such as 2+1, and that’s it. Let’s see if we can get it to handle something more complex. === Goals

  1. Handle parenthetical expressions (e.g., (4-3)*2))

  2. Teach our parser to handle floating point numbers

Parser

Let’s do a quick review:

  1. A Program is composed of Expressions

  2. An Expression returns a value

  3. An Operand is the left and right sides of an Operator, e.g., in 2+1 the operands are 2 and 1

  4. An Operator is +,-,*,/

  5. Nom parses all this into Tokens (found in src/tokens.rs)

We’re going to change our definitions a bit:

  1. An Expression consists of Terms

  2. A Term consists of Factors

  3. A Term is added or subtracted

  4. A Factor is multiplied or divided

  5. A Factor can be surrounded by a ( and a )

Important

Time for some math talk! Don’t worry, it isn’t complicated. =)

Unless you’ve worked with a Lisp-style language (e.g., Clojure) or one of those old HP calculators, you’re probably accustomed to seeing all math expressions written in the following form:

  1. 3+1

  2. 2 * (1+1)

See how the operator is in the middle? This is called infix notation. This is not the only way to write expressions, though. What if we wrote 3+1 as 3 1 +? That’s called postfix notation. And what if we wrote it as + 3 1? That’s called prefix notation. Pre and postfix notation are actually much easier to parse, but it raises the barrier to entry to your language, as people tend to dislike using non-infix notation.

New Tokens

Head over to src/tokens.rs. We need to change a few things:

#[derive(Debug, PartialEq, Clone)]
pub enum Token {
    AdditionOperator,
    SubtractionOperator,
    MultiplicationOperator,
    DivisionOperator,
    Integer{ value: i64 },
    Float{ value: f64},
    Factor{ value: Box<Token> },
    Term{ left: Box<Token>, right: Vec<(Token, Token)> },
    Expression{ left: Box<Token>, right: Vec<(Token, Token)> },
    Program{ expressions: Vec<Token> }
}

We’ve added a Token for Float, Factor, and Term. We also have removed the op field from Expression, and replaced with a Vec of tuples. We’ll see why in a minute.

We also have to stub out match arms for the new tokens in visitor.rs. For now, they can just look like:

&Token::Float{ value } => {},
&Token::Factor{ ref value } => {},
&Token::Term{ ref left, ref right } => {},
&Token::Expression{ ref left, ref right } => {},
&Token::Program{ ref expressions } => {},

Factors

Now let’s teach our parser what a factor is. Make src/factor_parsers.rs, and in it put:

use nom::*;
use nom::types::CompleteStr;

use tokens::Token;
use expression_parsers::expression;

named!(float64<CompleteStr, Token>,
    ws!(
        do_parse!(
            sign: opt!(tag!("-")) >>
            left_nums: digit >>
            tag!(".") >>
            right_nums: digit >>
            (
                {
                    let mut tmp = String::from("");
                    if sign.is_some() {
                        tmp.push_str("-");
                    }
                    tmp.push_str(&left_nums.to_string());
                    tmp.push_str(".");
                    tmp.push_str(&right_nums.to_string());
                    let converted = tmp.parse::<f64>().unwrap();
                    Token::Factor{ value: Box::new(Token::Float{value: converted}) }
                }
            )
        )
    )
);
The parser does the following:

  1. Consume any whitespace

  2. Check if there is a -. Because we use opt!, it is optional. This is to handle negative floats.

  3. Consume all digits until it encounters .

  4. Consume all digits after the .

  5. Construct the final floating point string representation

  6. Convert it to a f64

  7. Return a Factor token that contains a Float token

A few examples of the floats this parser is meant to detect are:

  1. 1.0

  2. -1.0

  3. 323.8

  4. 1.453

It’s nearly identical to our integer parser. And since an integer is also a Factor, let’s move our integer parser over:

named!(integer<CompleteStr, Token>,
    ws!(
        do_parse!(
            sign: opt!(tag!("-")) >>
            reg_num: digit >>
            (
                {
                    let mut tmp = String::from("");
                    if sign.is_some() {
                        tmp.push_str("-");
                    }
                    tmp.push_str(&reg_num.to_string());
                    let converted = tmp.parse::<i64>().unwrap();
                    Token::Integer{ value: converted }
                }
            )
        )
    )
);

No changes to it, just changing its location.

And now, at last, the Factor parser! It looks like this:

named!(pub factor<CompleteStr, Token>,
    ws!(
        do_parse!(
            f: alt!(
                integer |
                float64 |
                ws!(delimited!( tag_s!("("), expression, tag_s!(")") ))
            ) >>
            (
                {
                    Token::Factor{value: Box::new(f)}
                }
            )
        )
    )
);

Here’s a breakdown of what it is meant to do:

  1. Try the integer parser that we just moved into the same file

  2. Try the float64 parser we just made

  3. Look for an Expression surrounded by ( and )

The last one is very important, because it is what allows us to expressions with nested parentheses.

Expression Parser

Open up src/expression_parsers.rs. We need to change that parser a bit to work with our new Factor parser. The new Expression parser looks like this:

use nom::types::CompleteStr;

use tokens::Token;
use term_parsers::term;
use operator_parsers::*;

named!(pub expression<CompleteStr,Token>,
    do_parse!(
        left: term >>
        right: many0!(
            tuple!(
                alt!(
                    addition_operator |
                    subtraction_operator
                ),
                term
            )
        ) >>
        (
            {
                Token::Expression{left: Box::new(left), right: right}
            }
        )
    )
);
Important

Note the use changes. We are importing the Term parser, which we haven’t written yet. We’ll get there, don’t worry!

What this parser now does is:

  1. Look for a Term

  2. Looks for zero or characters that match the following pattern: + or - then a Term

  3. Because we use the tuple! macro, we will get back a (Token, Token) result, which is why we had to change our Token definitions a bit.

  4. Construct an Expression and return it

Term Parser

Our last new parser is the Term parser. Make src/term_parsers.rs, and in it put:

use nom::types::CompleteStr;

use tokens::Token;
use factor_parsers::factor;
use operator_parsers::{multiplication_operator, division_operator};

named!(pub term<CompleteStr,Token>,
    do_parse!(
        left: factor >>
        right: many0!(
            tuple!(
                alt!(
                    multiplication_operator |
                    division_operator
                ),
                factor
            )
        ) >>
        (
            {
                Token::Term{left: Box::new(left), right: right}
            }
        )
    )
);

It does almost the same thing as the Factor parser:

  1. Look for a Factor

  2. Looks for zero or characters that match the following pattern: *or / then a Factor

  3. Because we use the tuple! macro, we will get back a (Token, Token) result, which is why we had to change our Token definitions a bit.

  4. Construct a Term and return it

Note
Note how Term looks for * and /, while Factor looks for + and -. This is how we establish operator precedence.

Sample Expression

Let’s take a look at what Nom turns 3*4 into:

(
    CompleteStr(
        ""
    ),
    Term {
        left: Factor {
            value: Integer {
                value: 3
            }
        },
        right: [
            (
                MultiplicationOperator,
                Factor {
                    value: Integer {
                        value: 4
                    }
                }
            )
        ]
    }
)

On the left hand side, we have a Term, which has a Factor, which has an Integer. On the right, we have a Vec containing the MultiplicationOperator and a Factor that contains an Integer.

The ASTs for more complex expressions can get a lot larger. =)

Tests

Now let’s write some tests!

Term Tests

In term_parsers.rs, put:

mod tests {
    use super::*;

    #[test]
    fn test_parse_term() {
        let result = term(CompleteStr("3*4"));
        assert_eq!(result.is_ok(), true);
    }

    #[test]
    fn test_parse_nested_term() {
        let result = term(CompleteStr("(3*4)*2"));
        assert_eq!(result.is_ok(), true);
    }

    #[test]
    fn test_parse_really_nested_term() {
        let result = term(CompleteStr("((3*4)*2)"));
        assert_eq!(result.is_ok(), true);
    }
}

Factor Tests

These go in factor_parsers.rs:

mod tests {
    use super::*;
    #[test]
    fn test_factor() {
        let test_program = CompleteStr("(1+2)");
        let result = factor(test_program);
        assert_eq!(result.is_ok(), true);
        let (_, tree) = result.unwrap();
        println!("{:#?}", tree);
    }

    #[test]
    fn test_parse_floats() {
        let test_floats = vec!["100.4", "1.02", "-1.02"];
        for o in test_floats {
            let parsed_o = o.parse::<f64>().unwrap();
            let result = float64(CompleteStr(o));
            assert_eq!(result.is_ok(), true);
        }
    }

    #[test]
    fn test_parse_integer() {
        let test_integers = vec!["0", "-1", "1"];
        for o in test_integers {
            let parsed_o = o.parse::<i64>().unwrap();
            let result = integer(CompleteStr(o));
            assert_eq!(result.is_ok(), true);
        }
    }
}

Expression Tests

If we want to parse something that has all the operators in it, we need to use an Expression. Let’s add a few tests to expression_parsers.rs:

#[test]
fn test_parse_expression() {
    let result = expression(CompleteStr("3*4"));
    assert_eq!(result.is_ok(), true);
}

#[test]
fn test_parse_nested_expression() {
    let result = expression(CompleteStr("(3*4)+1"));
    assert_eq!(result.is_ok(), true);
}

Program Tests

And finally, let’s do a fancy test in program_parsers.rs:

#[test]
fn test_parse_program() {
    let test_program = CompleteStr("1+2");
    let result = program(test_program);
    assert_eq!(result.is_ok(), true);
}

If you run tests, you probably got a failure similar to this:

thread 'visitor::tests::test_visit_addition_token' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21

What this means is that in our addition function:

&Token::AdditionOperator => {
    // TODO: Need to clean this up. Remove the unwraps.
    let result_register = self.free_registers.pop().unwrap();
    let left_register = self.used_registers.pop().unwrap();
    let right_register = self.used_registers.pop().unwrap();
    let line = format!("ADD ${} ${} ${}", left_register, right_register, result_register);
    self.assembly.push(line);
    self.free_registers.push(left_register);
    self.free_registers.push(right_register);
},
One of the used registers has not yet been put onto the use stack. Remember how we started returning tuples? What’s happening is that in the Expression match arm:
&Token::Expression{ ref left, ref right } => {
    self.visit_token(left);
    for expression in right {
        self.visit_token(&expression.1);
        self.visit_token(&expression.0);
    }
},
when it evaluates this section of our tree:
right: [
    (
        AdditionOperator,
        Term {
            left: Factor {
                value: Integer {
                    value: 2
                }
            },
            right: []
        }
    )
]
It is going to vist the addition operator before the Integer value. This means that the value 2 has not been put in a register yet. We want the Expression function to visit them in the reverse order. So change:

&Token::Expression{ ref left, ref right } => {
    self.visit_token(left);
    for expression in right {
      self.visit_token(&expression.0);
      self.visit_token(&expression.1);
    }
},
&Token::Expression{ ref left, ref right } => {
    self.visit_token(left);
    for expression in right {
      self.visit_token(&expression.1);
      self.visit_token(&expression.0);
    }
},

You’ll also need to do the same to the Term arm.

Now if we run it..

thread 'visitor::tests::test_nested_operators' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
GRRRR!

This one took me a bit of debugging, and I had to write some debugging functions to print free and used registers. I also added this test to visitor.rs.

#[test]
fn test_nested_operators() {
    let mut compiler = Compiler::new();
    let test_program = generate_test_program("(4*3)-1");
    compiler.visit_token(&test_program);
    let bytecode = compiler.compile();
}
The program was parsing fine up until the -; the multiplication side was fine. By printing out registers in use when it hit the subtract arm:

--------------------
|  Used Registers  |
--------------------
0
--------------------
|  Free Registers  |
--------------------
30
<snip>
1
We should have TWO registers in use: One for the result of the multiplication instruction, and one for the load instruction of 1. Clearly that register was never being put on the used register list. If we take a look at the arm handling the SUB opcode:

self.print_used_registers();
self.print_free_registers();
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("SUB ${} ${} ${}", left_register, right_register, result_register);
self.assembly.push(line);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
See the problem? We never push result_register onto the used register list. We can fix it like so:
self.print_used_registers();
self.print_free_registers();
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("SUB ${} ${} ${}", left_register, right_register, result_register);
self.assembly.push(line);
self.used_registers.push(result_register);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
This was an issue in all the operator arms, so fix it the same way in the other three.

Now if we run them, they should all pass..

running 18 tests
test operator_parsers::tests::test_parse_division_operator ... ok
test operator_parsers::tests::test_parse_addition_operator ... ok
test operator_parsers::tests::test_parse_multiplication_operator ... ok
test factor_parsers::tests::test_parse_integer ... ok
test factor_parsers::tests::test_parse_floats ... ok
test expression_parsers::tests::test_parse_expression ... ok
test expression_parsers::tests::test_parse_nested_expression ... ok
test operator_parsers::tests::test_parse_operator ... ok
test operator_parsers::tests::test_parse_subtraction_operator ... ok
test program_parsers::tests::test_parse_program ... ok
test factor_parsers::tests::test_factor ... ok
test term_parsers::tests::test_parse_nested_term ... ok
test term_parsers::tests::test_parse_term ... ok
test visitor::tests::test_visit_addition_token ... ok
test visitor::tests::test_visit_subtraction_token ... ok
test term_parsers::tests::test_parse_really_nested_term ... ok
test visitor::tests::test_nested_operators ... ok
test visitor::tests::test_visit_multiplication_token ... ok

test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

YAY!

Note
I’ve added a few more tests. See the source for details on them.

Assembly Output

Let’s add a quick change to print the assembly output of the test_nested_operators test:

".data"
".code"
"LOAD $0 #4"
"LOAD $1 #3"
"MUL $1 $0 $2"
"LOAD $0 #1"
"SUB $0 $2 $1"

That looks right! If we go through it step-by-step:

  1. Load 4 into R0

  2. Load 3 into R1

  3. Multiply R0 and R1 and store result in R2 (which has the value of 12)

  4. Load 1 into R0

  5. Subtract R0 from R2 and store result in R3 (which should have 11 at the end)

  6. Crap, that’s -11

If we run our generated code through the Iridium REPL, we see:

Welcome to Iridium! Let's be productive!
>>> LOAD $0 #4
LOAD $1 #3
>>> MUL $1 $0 $2
>>> LOAD $0 #1
>>> SUB $0 $2 $1
>>> !registers
Listing registers and all contents:

>>> [
    1,
    -11,
    12,
    0,
    0,
// <snip>
]
Damnit! In this case, the problem is with Iridium. It handles subtraction with this code:
Opcode::SUB => {
    let register1 = self.registers[self.next_8_bits() as usize];
    let register2 = self.registers[self.next_8_bits() as usize];
    self.registers[self.next_8_bits() as usize] = register1 - register2;
}
And with how we are outputting code, we would need to either change the Iridium VM to do:
Opcode::SUB => {
    let register1 = self.registers[self.next_8_bits() as usize];
    let register2 = self.registers[self.next_8_bits() as usize];
    self.registers[self.next_8_bits() as usize] = register2 - register1;
}
Or we can change our parser to output subtraction instructions in the opposite order. For now, we’ll do it in the parser, since this is a Palladium tutorial. =)

In visitor.rs, we can change:

&Token::SubtractionOperator => {
    // TODO: Need to clean this up. Remove the unwraps.
    let result_register = self.free_registers.pop().unwrap();
    let left_register = self.used_registers.pop().unwrap();
    let right_register = self.used_registers.pop().unwrap();
    let line = format!("SUB ${} ${} ${}", left_register, right_register, result_register);
    self.assembly.push(line);
    self.used_registers.push(result_register);
    self.free_registers.push(left_register);
    self.free_registers.push(right_register);
},
to:
&Token::SubtractionOperator => {
    // TODO: Need to clean this up. Remove the unwraps.
    let result_register = self.free_registers.pop().unwrap();
    let left_register = self.used_registers.pop().unwrap();
    let right_register = self.used_registers.pop().unwrap();
    let line = format!("SUB ${} ${} ${}", right_register, left_register, result_register);
    self.assembly.push(line);
    self.used_registers.push(result_register);
    self.free_registers.push(left_register);
    self.free_registers.push(right_register);
},

And now our output is:

".data"
".code"
"LOAD $0 #4"
"LOAD $1 #3"
"MUL $1 $0 $2"
"LOAD $0 #1"
"SUB $2 $0 $1"

And running it through the Iridium REPL:

Welcome to Iridium! Let's be productive!
>>> LOAD $0 #4
LOAD $1 #3
>>> MUL $1 $0 $2
>>> LOAD $0 #1
>>> SUB $0 $2 $1
>>> !registers
Listing registers and all contents:
>>> [
    1,
    11,
    12,
    0,
//    <snip>
]

OK! YAY FOR REAL NOW!

End

That’s it for this installment! See everyone next time!


If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.