So You Want to Build a Language VM - Part 00 - Computer Hardware Crash Course

Covers general elements of computer hardware useful to know before reading the rest of the tutorials

July 16, 2018

A Brief Course in Computer Hardware

Hi there! This is the prelude to a series of posts to detailing how to build a language VM. If you are familiar with terms like registers, program counter and assembly, feel free to skip this post. If not, read on. Please note this is nowhere near comprehensive, but enough to understand what we’re building. == What is a Language VM? You know how you can you type python script.py and magic happens? That’s the Python virtual machine, or language interpreter, reading the source code you wrote, translating it down to bytecode the Python VM can understand, and then executing it.

I use the terms language interpreter and language VM interchangeably. I’ll try to be consistent, but then I try to resist unattended jelly doughnuts too.

Please make sure you have a C compiler installed! GCC or clang are good choices.

Note	Some of the code is purposefully not optimized so that we can go back later and learn about benchmarking and optimizing VMs.

What is a Program?

Like Frieza, a program has multiple forms. When you start coding one, you write text that looks like:

#include <stdio.h>
int main (void) {
  printf("Hello World!");
  return 0;
}

Your CPU has no idea what to do with that. We have to transform this text into something the CPU can understand and act upon: binary. This process (or series of processes) is often called compilation and requires more steps.

The Next Step Down

All processors have a language of their own they can understand. This is often called assembly code and is highly specific to the processor. Assembly that your iPhone or Galaxy C4 Boom Edition can understand is not comprehensible to that cheaper AMD proc you bought on NewEgg over the Intel one, and you totally don’t regret that decision at all.

You can write assembly code directly, though this is rare in modern times. Its tedious and annoying, much like an episode of Friends. Your friend the compiler can take your source code, and spit out assembly code for you. Let’s take our earlier C code example and put it in a file called 01_c_hello_world.c:

#include <stdio.h>
int main (void) {
  printf("Hello World!");
  return 0;
}

The Compiler

Save that somewhere on your disk. Now, from a terminal, run:

$ gcc -S 01_c_hello_world.c

You should have a file next to the .c file that has the same name but with the .s extension. Let’s see what’s inside…

$ cat /path/to/01_c_hello_world.s

You should see some version of the following:

.section	__TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 13
.globl	_main                   ## -- Begin function main
.p2align	4, 0x90
_main:                                  ## @main
.cfi_startproc
## BB#0:
pushq	%rbp
Lcfi0:
.cfi_def_cfa_offset 16
Lcfi1:
.cfi_offset %rbp, -16
movq	%rsp, %rbp
Lcfi2:
.cfi_def_cfa_register %rbp
subq	$16, %rsp
leaq	L_.str(%rip), %rdi
movl	$0, -4(%rbp)
movb	$0, %al
callq	_printf
xorl	%ecx, %ecx
movl	%eax, -8(%rbp)          ## 4-byte Spill
movl	%ecx, %eax
addq	$16, %rsp
popq	%rbp
retq
.cfi_endproc
                                      ## -- End function
.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
.asciz	"Hello World!"

Don’t panic! You don’t need to know what all that means, nor will we be writing this. Its to show what assembly looks like.

The Assembler

Once we have the assembly code, we can convert that into an object file.

To see the assembler in action, you can run:

$ gcc -c 01_c_hello_world.s -o 01_c_hello_world.o

You should now see a third file, called 01_c_hello_world.o. The directory should look like this:

$ ls
01_c_hello_world.c	01_c_hello_world.o	01_c_hello_world.s

The object file contains both machine code and metadata. Another program, a linker, then reads the object file, links in any needed libraries, and produces an executable.

Enter the Java JVM, .NET CLR, and Other language VMs

One of the benefits used to market Java way back when it first lumbered onto the scene was the "write once, run anywhere" promise. That is, the Java code you wrote could run, unmodified, on any hardware platform that could run the JVM. This meant that people needed to care about one program, the JVM, running on their hardware, and Sun Microsystems (later Oracle) would take care of that part.

Other languages follow this model: the .NET CLR, Python, Ruby, Perl, and more.

Note

Did you know that women were the first programmers? The hardware aspect of early computers were seen as the manly parts of computers: twiddling dials, fiddling with circuits, and such. Writing the code was seen as more secretarial work. Our world would not exist as it does today without them. I highly recommend reading about the following people: Ada Lovelace, Grace Hopper, and Katherine Johnson.

A Faustian Bargain

While these VMs provide services (hardware abstraction, garbage collection, and more), it all comes with a price: slower execution speed and higher resource consumption. As a general rule, languages that run on a VM execute more slowly than ones compiled to run on specific hardware.

Note	Yes, there are a lot of other topics to get into here, such as JIT compilers, native code extensions, and all the rest. I’m going to skip those for now.

Registers

The last thing to cover in this post is the concept of registers. On a CPU, a register is a special area to store data. For a more detailed explanation, I’ll steal from Wikipedia:

In computer architecture, a processor register is a quickly accessible location available to a computer’s central processing unit (CPU). Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. Registers are typically addressed by mechanisms other than main memory, but may in some cases be assigned a memory address e.g. DEC PDP-10, ICT 1900.

— Wikipedia
https://en.wikipedia.org/wiki/Processor_register

When your CPU executes code to set a variable to the number 5, that 5 is probably going to be loaded into a register somewhere. Our application that is pretending to be a CPU will also have registers it can use.

Summary

We’re going to write an application that pretends to be a CPU, and executes programs we write for it. Which, of course, means we’ll have to invent a language too. But we’ll get to all that later. You should now have enough basic knowledge to go on to the next section.

If you need some assistance with any of the topics in the tutorials, or just devops and application development in general, we offer consulting services. Check it out over here or click Services along the top.