So You Want to Build a Language VM - Part 10 - Assembler 3: Assemble Harder
Teaches our assembler to recognize more instruction forms
Improving the Assembler
Our assembler right now can recognize one opcode, load
. We need to teach it to recognize all the rest. There’s a couple ways we can do that:
We can write a parser for each opcode
We can write a parser that recognizes the letters
a-z
and then check if they are a valid Opcode.
Let’s go with option #2, since it will require much less copy-paste. It also gives us an excuse to implement From<CompleteStr<_>>
for our opcodes!
== The From<&str>
Trait
In instruction.rs
, below the block where we implemented From<u8>
, put this:
impl<'a> From<CompleteStr<'a>> for Opcode {
fn from(v: CompleteStr<'a>) -> Self {
match v {
CompleteStr("load") => Opcode::LOAD,
CompleteStr("add") => Opcode::ADD,
CompleteStr("sub") => Opcode::SUB,
CompleteStr("mul") => Opcode::MUL,
CompleteStr("div") => Opcode::DIV,
CompleteStr("hlt") => Opcode::HLT,
CompleteStr("jmp") => Opcode::JMP,
CompleteStr("jmpf") => Opcode::JMPF,
CompleteStr("jmpb") => Opcode::JMPB,
CompleteStr("eq") => Opcode::EQ,
CompleteStr("neq") => Opcode::NEQ,
CompleteStr("gte") => Opcode::GTE,
CompleteStr("gt") => Opcode::GT,
CompleteStr("lte") => Opcode::LTE,
CompleteStr("lt") => Opcode::LT,
CompleteStr("jmpe") => Opcode::JMPE,
CompleteStr("nop") => Opcode::NOP,
_ => Opcode::IGL,
}
}
}
The Parser
In src/assembler/opcode_parsers.rs
, we have this:
named!(pub opcode_load<CompleteStr, Token>,
do_parse!(
tag!("load") >> (Token::Op{code: Opcode::LOAD})
)
);
Now that we have From<CompleteStr<_'>>
for our Opcode done, head over to instruction.rs
. Nom
has this nifty function. Let’s change our opcode parser to:
named!(pub opcode<CompleteStr, Token>,
do_parse!(
opcode: alpha1! >>
(
Token::Op{code: Opcode::from(opcode)}
)
)
);
Important | Don’t forget to add use nom::types::CompleteStr at the top in instruction.rs ! |
Now we’ll get an IGL
opcode for any illegal opcode the user types.
Test
We’ll need to alter our test_opcode_load
in opcode_parsers.rs
a bit to handle our new parser. Change it to:
#![allow(unused_imports)]
use super::opcode;
use assembler::Token;
use instruction::Opcode;
use nom::types::CompleteStr;
#[test]
fn test_opcode() {
let result = opcode(CompleteStr("load"));
assert_eq!(result.is_ok(), true);
let (rest, token) = result.unwrap();
assert_eq!(token, Token::Op { code: Opcode::LOAD });
assert_eq!(rest, CompleteStr(""));
let result = opcode(CompleteStr("aold"));
let (_, token) = result.unwrap();
assert_eq!(token, Token::Op { code: Opcode::IGL });
}
}
cargo test
should show all tests still passing.
Updating instruction.rs
Another update we need to make is to the test_str_to_opcode
test. Change it to:
#[test]
fn test_str_to_opcode() {
let opcode = Opcode::from(CompleteStr("load"));
assert_eq!(opcode, Opcode::LOAD);
let opcode = Opcode::from(CompleteStr("illegal"));
assert_eq!(opcode, Opcode::IGL);
}
More Instruction Forms
In instruction_parsers.rs
, we wrote a parser for instructions that follow this form: <opcode> <register> <integer operand>
. We have more forms instructions can take, though, so let’s write those.
First, change the parser named instruction
to instruction_one
and remove the pub
from it.
Single Opcode
Some instructions take no operands, like HLT
. They have the form of <opcode>
. The parser is:
named!(instruction_one<CompleteStr, AssemblyInstruction>,
do_parse!(
o: opcode >>
opt!(multispace) >>
(
AssemblyInstruction{
opcode: o,
operand1: None,
operand2: None,
operand3: None,
}
)
)
);
Important | You’ll need to add use nom::multispace; to the top of instruction_parsers.rs . |
And a test for it…
#[test]
fn test_parse_instruction_form_two() {
let result = instruction_two(CompleteStr("hlt\n"));
assert_eq!(
result,
Ok((
CompleteStr(""),
AssemblerInstruction {
opcode: Token::Op { code: Opcode::HLT },
operand1: None,
operand2: None,
operand3: None
}
))
);
}
Using alt!()
We now have parsers for two possible instruction forms. But how do we tell our assembler to try each instruction form and parse whichever one is valid, if any? Nom has a nifty macro for that called alt
. We can give it a list of parsers, like this:
/// Will try to parse out any of the Instruction forms
named!(pub instruction<CompleteStr, AssemblerInstruction>,
do_parse!(
ins: alt!(
instruction_one |
instruction_two
) >>
(
ins
)
)
);
See how it lets us try a list of parsers? It will return the first valid one it finds. As we add more instruction forms, we’ll add them here. Also note how this is now the pub
parser, and the one the Program
should use. Which means you now need to go into program_parsers.rs
and change all the instruction_one
references to instruction
. =)
Other Instruction Forms
We’ll also need a parser for the form: <opcode> <register> <register> <register> for instructions like ADD $0 $1 $2
.
As we continue writing our application, we’ll have more forms we need to write parsers for. I’ll leave the last form to you to do. If you get stuck, you can check out the code on GitLab.
End
I’m going to end this part here. In the next part, we’ll start talking about memory and strings.
Try not to get too excited. =)
If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.