So You Want to Build a Language VM - Part 10 - Assembler 3: Assemble Harder

Teaches our assembler to recognize more instruction forms

August 21, 2018

Improving the Assembler

Our assembler right now can recognize one opcode, load. We need to teach it to recognize all the rest. There’s a couple ways we can do that:

We can write a parser for each opcode
We can write a parser that recognizes the letters a-z and then check if they are a valid Opcode.

Let’s go with option #2, since it will require much less copy-paste. It also gives us an excuse to implement From<CompleteStr<_>> for our opcodes! == The From<&str> Trait In instruction.rs, below the block where we implemented From<u8>, put this:

impl<'a> From<CompleteStr<'a>> for Opcode {
    fn from(v: CompleteStr<'a>) -> Self {
        match v {
            CompleteStr("load") => Opcode::LOAD,
            CompleteStr("add") => Opcode::ADD,
            CompleteStr("sub") => Opcode::SUB,
            CompleteStr("mul") => Opcode::MUL,
            CompleteStr("div") => Opcode::DIV,
            CompleteStr("hlt") => Opcode::HLT,
            CompleteStr("jmp") => Opcode::JMP,
            CompleteStr("jmpf") => Opcode::JMPF,
            CompleteStr("jmpb") => Opcode::JMPB,
            CompleteStr("eq") => Opcode::EQ,
            CompleteStr("neq") => Opcode::NEQ,
            CompleteStr("gte") => Opcode::GTE,
            CompleteStr("gt") => Opcode::GT,
            CompleteStr("lte") => Opcode::LTE,
            CompleteStr("lt") => Opcode::LT,
            CompleteStr("jmpe") => Opcode::JMPE,
            CompleteStr("nop") => Opcode::NOP,
            _ => Opcode::IGL,
        }
    }
}

The Parser

In src/assembler/opcode_parsers.rs, we have this:

named!(pub opcode_load<CompleteStr, Token>,
  do_parse!(
      tag!("load") >> (Token::Op{code: Opcode::LOAD})
  )
);

Now that we have From<CompleteStr<_'>> for our Opcode done, head over to instruction.rs. Nom has this nifty function. Let’s change our opcode parser to:

named!(pub opcode<CompleteStr, Token>,
  do_parse!(
      opcode: alpha1! >>
      (
        Token::Op{code: Opcode::from(opcode)}
      )
  )
);

Important

Don’t forget to add use nom::types::CompleteStr at the top in instruction.rs!

Now we’ll get an IGL opcode for any illegal opcode the user types.

Test

We’ll need to alter our test_opcode_load in opcode_parsers.rs a bit to handle our new parser. Change it to:

#![allow(unused_imports)]
use super::opcode;
use assembler::Token;
use instruction::Opcode;
use nom::types::CompleteStr;

#[test]
fn test_opcode() {
    let result = opcode(CompleteStr("load"));
    assert_eq!(result.is_ok(), true);
    let (rest, token) = result.unwrap();
    assert_eq!(token, Token::Op { code: Opcode::LOAD });
    assert_eq!(rest, CompleteStr(""));
    let result = opcode(CompleteStr("aold"));
    let (_, token) = result.unwrap();
    assert_eq!(token, Token::Op { code: Opcode::IGL });
}
}

cargo test should show all tests still passing.

Updating `instruction.rs`

Another update we need to make is to the test_str_to_opcode test. Change it to:

#[test]
fn test_str_to_opcode() {
    let opcode = Opcode::from(CompleteStr("load"));
    assert_eq!(opcode, Opcode::LOAD);
    let opcode = Opcode::from(CompleteStr("illegal"));
    assert_eq!(opcode, Opcode::IGL);
}

More Instruction Forms

In instruction_parsers.rs, we wrote a parser for instructions that follow this form: <opcode> <register> <integer operand>. We have more forms instructions can take, though, so let’s write those.

First, change the parser named instruction to instruction_one and remove the pub from it.

Single Opcode

Some instructions take no operands, like HLT. They have the form of <opcode>. The parser is:

named!(instruction_one<CompleteStr, AssemblyInstruction>,
    do_parse!(
        o: opcode >>
        opt!(multispace) >>
        (
            AssemblyInstruction{
                opcode: o,
                operand1: None,
                operand2: None,
                operand3: None,
            }
        )
    )
);

Important

You’ll need to add use nom::multispace; to the top of instruction_parsers.rs.

And a test for it…

#[test]
fn test_parse_instruction_form_two() {
    let result = instruction_two(CompleteStr("hlt\n"));
    assert_eq!(
        result,
        Ok((
            CompleteStr(""),
            AssemblerInstruction {
                opcode: Token::Op { code: Opcode::HLT },
                operand1: None,
                operand2: None,
                operand3: None
            }
        ))
    );
}

Using alt!()

We now have parsers for two possible instruction forms. But how do we tell our assembler to try each instruction form and parse whichever one is valid, if any? Nom has a nifty macro for that called alt. We can give it a list of parsers, like this:

/// Will try to parse out any of the Instruction forms
named!(pub instruction<CompleteStr, AssemblerInstruction>,
    do_parse!(
        ins: alt!(
            instruction_one |
            instruction_two
        ) >>
        (
            ins
        )
    )
);

See how it lets us try a list of parsers? It will return the first valid one it finds. As we add more instruction forms, we’ll add them here. Also note how this is now the pub parser, and the one the Program should use. Which means you now need to go into program_parsers.rs and change all the instruction_one references to instruction. =)

Other Instruction Forms

We’ll also need a parser for the form: <opcode> <register> <register> <register> for instructions like ADD $0 $1 $2.

As we continue writing our application, we’ll have more forms we need to write parsers for. I’ll leave the last form to you to do. If you get stuck, you can check out the code on GitLab.

End

I’m going to end this part here. In the next part, we’ll start talking about memory and strings.

Try not to get too excited. =)

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.