So You Want to Build a Language VM - Part 25 - Extending Palladium: Part 1
Extends Palladium to handle more complex arithmetic expressions
Intro
Hey everyone! For this tutorial, we’re going to switch tracks and work on Palladium a bit. Right now, it handles simple arithmetic expressions, such as 2+1
, and that’s it. Let’s see if we can get it to handle something more complex.
=== Goals
Handle parenthetical expressions (e.g.,
(4-3)*2)
)Teach our parser to handle floating point numbers
Parser
Let’s do a quick review:
A
Program
is composed ofExpressions
An
Expression
returns a valueAn
Operand
is the left and right sides of anOperator
, e.g., in 2+1 the operands are 2 and 1An
Operator
is+,-,*,/
Nom parses all this into
Tokens
(found insrc/tokens.rs
)
We’re going to change our definitions a bit:
An
Expression
consists ofTerms
A
Term
consists ofFactors
A
Term
is added or subtractedA
Factor
is multiplied or dividedA
Factor
can be surrounded by a(
and a)
Important | Time for some math talk! Don’t worry, it isn’t complicated. =) Unless you’ve worked with a Lisp-style language (e.g., Clojure) or one of those old HP calculators, you’re probably accustomed to seeing all math expressions written in the following form:
See how the operator is in the middle? This is called infix notation. This is not the only way to write expressions, though. What if we wrote |
New Tokens
Head over to src/tokens.rs
. We need to change a few things:
#[derive(Debug, PartialEq, Clone)]
pub enum Token {
AdditionOperator,
SubtractionOperator,
MultiplicationOperator,
DivisionOperator,
Integer{ value: i64 },
Float{ value: f64},
Factor{ value: Box<Token> },
Term{ left: Box<Token>, right: Vec<(Token, Token)> },
Expression{ left: Box<Token>, right: Vec<(Token, Token)> },
Program{ expressions: Vec<Token> }
}
We’ve added a Token for Float
, Factor
, and Term
. We also have removed the op
field from Expression
, and replaced with a Vec
of tuples. We’ll see why in a minute.
We also have to stub out match arms for the new tokens in visitor.rs
. For now, they can just look like:
&Token::Float{ value } => {},
&Token::Factor{ ref value } => {},
&Token::Term{ ref left, ref right } => {},
&Token::Expression{ ref left, ref right } => {},
&Token::Program{ ref expressions } => {},
Factors
Now let’s teach our parser what a factor is. Make src/factor_parsers.rs
, and in it put:
use nom::*;
use nom::types::CompleteStr;
use tokens::Token;
use expression_parsers::expression;
named!(float64<CompleteStr, Token>,
ws!(
do_parse!(
sign: opt!(tag!("-")) >>
left_nums: digit >>
tag!(".") >>
right_nums: digit >>
(
{
let mut tmp = String::from("");
if sign.is_some() {
tmp.push_str("-");
}
tmp.push_str(&left_nums.to_string());
tmp.push_str(".");
tmp.push_str(&right_nums.to_string());
let converted = tmp.parse::<f64>().unwrap();
Token::Factor{ value: Box::new(Token::Float{value: converted}) }
}
)
)
)
);
Consume any whitespace
Check if there is a
-
. Because we useopt!
, it is optional. This is to handle negative floats.Consume all digits until it encounters
.
Consume all digits after the
.
Construct the final floating point string representation
Convert it to a
f64
Return a
Factor
token that contains aFloat
token
A few examples of the floats this parser is meant to detect are:
1.0
-1.0
323.8
1.453
It’s nearly identical to our integer parser. And since an integer is also a Factor
, let’s move our integer parser over:
named!(integer<CompleteStr, Token>,
ws!(
do_parse!(
sign: opt!(tag!("-")) >>
reg_num: digit >>
(
{
let mut tmp = String::from("");
if sign.is_some() {
tmp.push_str("-");
}
tmp.push_str(®_num.to_string());
let converted = tmp.parse::<i64>().unwrap();
Token::Integer{ value: converted }
}
)
)
)
);
No changes to it, just changing its location.
And now, at last, the Factor
parser! It looks like this:
named!(pub factor<CompleteStr, Token>,
ws!(
do_parse!(
f: alt!(
integer |
float64 |
ws!(delimited!( tag_s!("("), expression, tag_s!(")") ))
) >>
(
{
Token::Factor{value: Box::new(f)}
}
)
)
)
);
Here’s a breakdown of what it is meant to do:
Try the
integer
parser that we just moved into the same fileTry the
float64
parser we just madeLook for an
Expression
surrounded by(
and)
The last one is very important, because it is what allows us to expressions with nested parentheses.
Expression Parser
Open up src/expression_parsers.rs
. We need to change that parser a bit to work with our new Factor
parser. The new Expression
parser looks like this:
use nom::types::CompleteStr;
use tokens::Token;
use term_parsers::term;
use operator_parsers::*;
named!(pub expression<CompleteStr,Token>,
do_parse!(
left: term >>
right: many0!(
tuple!(
alt!(
addition_operator |
subtraction_operator
),
term
)
) >>
(
{
Token::Expression{left: Box::new(left), right: right}
}
)
)
);
Important | Note the |
What this parser now does is:
Look for a
Term
Looks for zero or characters that match the following pattern:
+
or-
then aTerm
Because we use the
tuple!
macro, we will get back a(Token, Token)
result, which is why we had to change ourToken
definitions a bit.Construct an
Expression
and return it
Term Parser
Our last new parser is the Term
parser. Make src/term_parsers.rs
, and in it put:
use nom::types::CompleteStr;
use tokens::Token;
use factor_parsers::factor;
use operator_parsers::{multiplication_operator, division_operator};
named!(pub term<CompleteStr,Token>,
do_parse!(
left: factor >>
right: many0!(
tuple!(
alt!(
multiplication_operator |
division_operator
),
factor
)
) >>
(
{
Token::Term{left: Box::new(left), right: right}
}
)
)
);
It does almost the same thing as the Factor
parser:
Look for a
Factor
Looks for zero or characters that match the following pattern:
*or
/
then aFactor
Because we use the
tuple!
macro, we will get back a(Token, Token)
result, which is why we had to change ourToken
definitions a bit.Construct a
Term
and return it
Note | Note how Term looks for * and / , while Factor looks for + and - . This is how we establish operator precedence. |
Sample Expression
Let’s take a look at what Nom turns 3*4
into:
(
CompleteStr(
""
),
Term {
left: Factor {
value: Integer {
value: 3
}
},
right: [
(
MultiplicationOperator,
Factor {
value: Integer {
value: 4
}
}
)
]
}
)
On the left hand side, we have a Term
, which has a Factor
, which has an Integer
. On the right, we have a Vec
containing the MultiplicationOperator
and a Factor
that contains an Integer
.
The ASTs for more complex expressions can get a lot larger. =)
Tests
Now let’s write some tests!
Term Tests
In term_parsers.rs
, put:
mod tests {
use super::*;
#[test]
fn test_parse_term() {
let result = term(CompleteStr("3*4"));
assert_eq!(result.is_ok(), true);
}
#[test]
fn test_parse_nested_term() {
let result = term(CompleteStr("(3*4)*2"));
assert_eq!(result.is_ok(), true);
}
#[test]
fn test_parse_really_nested_term() {
let result = term(CompleteStr("((3*4)*2)"));
assert_eq!(result.is_ok(), true);
}
}
Factor Tests
These go in factor_parsers.rs
:
mod tests {
use super::*;
#[test]
fn test_factor() {
let test_program = CompleteStr("(1+2)");
let result = factor(test_program);
assert_eq!(result.is_ok(), true);
let (_, tree) = result.unwrap();
println!("{:#?}", tree);
}
#[test]
fn test_parse_floats() {
let test_floats = vec!["100.4", "1.02", "-1.02"];
for o in test_floats {
let parsed_o = o.parse::<f64>().unwrap();
let result = float64(CompleteStr(o));
assert_eq!(result.is_ok(), true);
}
}
#[test]
fn test_parse_integer() {
let test_integers = vec!["0", "-1", "1"];
for o in test_integers {
let parsed_o = o.parse::<i64>().unwrap();
let result = integer(CompleteStr(o));
assert_eq!(result.is_ok(), true);
}
}
}
Expression Tests
If we want to parse something that has all the operators in it, we need to use an Expression. Let’s add a few tests to expression_parsers.rs
:
#[test]
fn test_parse_expression() {
let result = expression(CompleteStr("3*4"));
assert_eq!(result.is_ok(), true);
}
#[test]
fn test_parse_nested_expression() {
let result = expression(CompleteStr("(3*4)+1"));
assert_eq!(result.is_ok(), true);
}
Program Tests
And finally, let’s do a fancy test in program_parsers.rs
:
#[test]
fn test_parse_program() {
let test_program = CompleteStr("1+2");
let result = program(test_program);
assert_eq!(result.is_ok(), true);
}
If you run tests, you probably got a failure similar to this:
thread 'visitor::tests::test_visit_addition_token' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21
What this means is that in our addition function:
&Token::AdditionOperator => {
// TODO: Need to clean this up. Remove the unwraps.
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("ADD ${} ${} ${}", left_register, right_register, result_register);
self.assembly.push(line);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
},
Expression
match arm:&Token::Expression{ ref left, ref right } => {
self.visit_token(left);
for expression in right {
self.visit_token(&expression.1);
self.visit_token(&expression.0);
}
},
right: [
(
AdditionOperator,
Term {
left: Factor {
value: Integer {
value: 2
}
},
right: []
}
)
]
Integer
value. This means that the value 2 has not been put in a register yet. We want the Expression
function to visit them in the reverse order. So change:&Token::Expression{ ref left, ref right } => {
self.visit_token(left);
for expression in right {
self.visit_token(&expression.0);
self.visit_token(&expression.1);
}
},
&Token::Expression{ ref left, ref right } => {
self.visit_token(left);
for expression in right {
self.visit_token(&expression.1);
self.visit_token(&expression.0);
}
},
You’ll also need to do the same to the Term
arm.
Now if we run it..
thread 'visitor::tests::test_nested_operators' panicked at 'called `Option::unwrap()` on a `None` value', libcore/option.rs:345:21
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
This one took me a bit of debugging, and I had to write some debugging functions to print free and used registers. I also added this test to visitor.rs
.
#[test]
fn test_nested_operators() {
let mut compiler = Compiler::new();
let test_program = generate_test_program("(4*3)-1");
compiler.visit_token(&test_program);
let bytecode = compiler.compile();
}
-
; the multiplication side was fine. By printing out registers in use when it hit the subtract arm:--------------------
| Used Registers |
--------------------
0
--------------------
| Free Registers |
--------------------
30
<snip>
1
1
. Clearly that register was never being put on the used register list. If we take a look at the arm handling the SUB
opcode:self.print_used_registers();
self.print_free_registers();
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("SUB ${} ${} ${}", left_register, right_register, result_register);
self.assembly.push(line);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
result_register
onto the used register list. We can fix it like so:self.print_used_registers();
self.print_free_registers();
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("SUB ${} ${} ${}", left_register, right_register, result_register);
self.assembly.push(line);
self.used_registers.push(result_register);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
Now if we run them, they should all pass..
running 18 tests
test operator_parsers::tests::test_parse_division_operator ... ok
test operator_parsers::tests::test_parse_addition_operator ... ok
test operator_parsers::tests::test_parse_multiplication_operator ... ok
test factor_parsers::tests::test_parse_integer ... ok
test factor_parsers::tests::test_parse_floats ... ok
test expression_parsers::tests::test_parse_expression ... ok
test expression_parsers::tests::test_parse_nested_expression ... ok
test operator_parsers::tests::test_parse_operator ... ok
test operator_parsers::tests::test_parse_subtraction_operator ... ok
test program_parsers::tests::test_parse_program ... ok
test factor_parsers::tests::test_factor ... ok
test term_parsers::tests::test_parse_nested_term ... ok
test term_parsers::tests::test_parse_term ... ok
test visitor::tests::test_visit_addition_token ... ok
test visitor::tests::test_visit_subtraction_token ... ok
test term_parsers::tests::test_parse_really_nested_term ... ok
test visitor::tests::test_nested_operators ... ok
test visitor::tests::test_visit_multiplication_token ... ok
test result: ok. 18 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
YAY!
Note | I’ve added a few more tests. See the source for details on them. |
Assembly Output
Let’s add a quick change to print the assembly output of the test_nested_operators
test:
".data"
".code"
"LOAD $0 #4"
"LOAD $1 #3"
"MUL $1 $0 $2"
"LOAD $0 #1"
"SUB $0 $2 $1"
That looks right! If we go through it step-by-step:
Load 4 into R0
Load 3 into R1
Multiply R0 and R1 and store result in R2 (which has the value of 12)
Load 1 into R0
Subtract R0 from R2 and store result in R3 (which should have 11 at the end)
Crap, that’s -11
If we run our generated code through the Iridium REPL, we see:
Welcome to Iridium! Let's be productive!
>>> LOAD $0 #4
LOAD $1 #3
>>> MUL $1 $0 $2
>>> LOAD $0 #1
>>> SUB $0 $2 $1
>>> !registers
Listing registers and all contents:
>>> [
1,
-11,
12,
0,
0,
// <snip>
]
Opcode::SUB => {
let register1 = self.registers[self.next_8_bits() as usize];
let register2 = self.registers[self.next_8_bits() as usize];
self.registers[self.next_8_bits() as usize] = register1 - register2;
}
Opcode::SUB => {
let register1 = self.registers[self.next_8_bits() as usize];
let register2 = self.registers[self.next_8_bits() as usize];
self.registers[self.next_8_bits() as usize] = register2 - register1;
}
In visitor.rs
, we can change:
&Token::SubtractionOperator => {
// TODO: Need to clean this up. Remove the unwraps.
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("SUB ${} ${} ${}", left_register, right_register, result_register);
self.assembly.push(line);
self.used_registers.push(result_register);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
},
&Token::SubtractionOperator => {
// TODO: Need to clean this up. Remove the unwraps.
let result_register = self.free_registers.pop().unwrap();
let left_register = self.used_registers.pop().unwrap();
let right_register = self.used_registers.pop().unwrap();
let line = format!("SUB ${} ${} ${}", right_register, left_register, result_register);
self.assembly.push(line);
self.used_registers.push(result_register);
self.free_registers.push(left_register);
self.free_registers.push(right_register);
},
And now our output is:
".data"
".code"
"LOAD $0 #4"
"LOAD $1 #3"
"MUL $1 $0 $2"
"LOAD $0 #1"
"SUB $2 $0 $1"
And running it through the Iridium REPL:
Welcome to Iridium! Let's be productive!
>>> LOAD $0 #4
LOAD $1 #3
>>> MUL $1 $0 $2
>>> LOAD $0 #1
>>> SUB $0 $2 $1
>>> !registers
Listing registers and all contents:
>>> [
1,
11,
12,
0,
// <snip>
]
OK! YAY FOR REAL NOW!
End
That’s it for this installment! See everyone next time!
If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.