Exploring brainfuck: Crafting a good interpreter.

October 01, 2023 — matthilde

Brainfuck is a simple esoteric programming language invented by Urban Muller in the 90s. This programming language contains 8 rather low-level commands to control a tape and perform operations on it. Despite its extreme minimalism, it is a Turing-complete programming language (meaning you can in theory write any algorithm with it) and that means there is room for a lot of wild cool stuff to do on it :). In this series of blogposts (hoping it does actually turn into a series), I will explore brainfuck and see what I can make out of this simple language.

For this first blog of a potential series, I decided I will start with the very basic piece required to run brainfuck code: an interpreter. I could've used an existing one (there are in fact, a bajillion of these online) but I decided to write my own.

NOTICE: This series will not be a tutorial on writing brainfuck interpreters or writing brainfuck code in any way, they will be more or less writeups of the various experimentations I have done regarding the language. If you want to learn more about brainfuck, I suggest you read the article about it on esolangs.org.

Moving on to the serious stuff...

A few years ago I wrote BFVM, a brainfuck interpreter and JIT that uses a virtual machine to optimize code execution. I could've used that one and write about actual brainfuck programming in this blog post, considering it works just fine, but there were 3 issues:

It does not have a debugger
It contains memory safety issues
The codebase is an attrocious pile of garbage

Given that the third problem makes the two former problems even harder to solve, I just decided to entirely rewrite it.

The base design

BFVM, even though more complex than a "naive" interpreter, has a very straightforward architecture: It assembles the brainfuck program into a stream of bytecode instructions, then run it. BFVM2 will follow the exact same architecture for now.

Each instruction will be a struct that contains for now the following:

struct BFOpcode {
    // Instruction to execute
    Instruction inst;
    // The instruction's argument, if necessary
    int arg;
    // Line number and character number in the source code
    int ln, cn;
}

It contains an instruction ID, and an argument. The argument will be useful to pack multiple instructions into one like this:

++++ will compile, for example, into ADD 4
<<<< will compile into PTR -4
+++-- will compile into ADD 1
...

Branching and I/O instructions are an exception, as it isn't really necessary to reduce them together. With this information, this leads us to the creation of an instruction set, which ressembles brainfuck itself a lot. The difference between brainfuck and the VM are minimal, such as being able to pass arguments for each instruction, and that branching instructions are jumps as one would see in a typical CPU. Additionally, I will also add a special instruction that can be enabled only if debug mode is enabled. This instruction will allow us to monitor the tape, the code and manipulate memory. I'll elaborate on the debugger later on in the blog.

Considering that we can pass signed arguments, there's no need for distinct ADD/SUB operations and LEFT/RIGHT operations (note that in the first version, there were distinct operations with an unsigned argument), we can just have ADD and PTR instructions:

ADD n       - Adds n to the current cell
PTR n       - Move the pointer n to the left or right depending on the sign
RED         - Reads a character from stdin and write to the cell
WRT         - Reads from the cell and write to stdout
JZ  addr    - If cell=0, go to addr
JNZ addr    - If cell!=0, got to addr
 -- SPECIAL INSTRUCTIONS --
HLT         - Halts the program, this instruction is at the end
DBG level   - Launches the debugger in mode level

There are also registers, of course. The virtual machine has two registers; the Data Pointer and the Program Counter which both fills the same purpose as they would in a conventional brainfuck interpreter. The PC points to the current instruction to execute, the data pointer points to the current cell to process.

JZ and JNZ corresponds to [ and ] respectively. JZ has the location of JNZ as argument and vice-versa, this is much faster than having to seek for [ or ] when encountered. It uses a stack during compile time to keep track of the opcode locations.

And with that design, the interpreter can already run at much higher speed than a naive interpreter. See benchmark.txt for details.

The debugger

There's no need for a big debugger in such a small interpreter, just enough to know where you are, what is happening and eventually being able to view and modify tape memory and step through code. The BFVM2 debugger fills all these purposes with a very simple command-line interface and a few commands. The debugger is called when the interpreter is in DEBUG mode and reaches a '!' instruction. Multiple !'s can be added together to set the mode of the debugger. The modes are the following

Mode 1 (!) -- Breakpoint, stops the program and jumps in the debug console.
Mode 2 (!!) -- Displays tape and register information.
Mode 3 (!!!) -- Prints the current value of the current cell.

The modes 2 and 3 are pretty straightforward to understand, but the mode 1 is more complex as it sends the user to a debugger console where they can write commands to modify, view and control the machine's state. Each command is a character long and may consume an integer argument, provided by the user. Multiple commands can also be written in the same line, here's some examples:

ssss - go 4 steps forward in the exectuion
D3I73 - set cells[3] to 73
dt - Disassemble the program then show the current tape state

See debug.c if you wish to better understand how it works.

Going further...

This is where I'm at so far in the development of BFVM2 at the time of writing. There's room for a lot of debugger and performance enhancement.

I am considering finding other ways to compile the code to allow better optimization of the code, such as finding common patterns and "linear loops" (loops in which the number of < and > are balanced, often used for cell transfer). The software architecture might get more complex and slowly step away from BFVM's architecture.

There are also subtle optimizations that may be worth doing, such as replacing the DP and PC registers by pointers, which may also make better performances, or maybe look for where i can add code inline, make code more branch-less, etc.

BFVM2 is available on Codeberg and has reached an usable state. You can run big and complex brainfuck programs with it already. :)

The next episode of this blog series might be about brainfuck programming itself, rather than BFVM, as I am working on a macro system to ease practical brainfuck programming. So stay tuned!

Tags: esolangs, brainfuck, programming

~matthilde.blog