As an excercise I wanted to make a bf interpreter in .NET Core.

I wanted to see how fast I could make it go.

Introduction

The language we'll be interpreting is very small. It has 8 instructions that operate on an array of bytes and on a pointer to a location in that array.

  • ">" increment the data pointer (to point to the next cell to the right)
  • "<" decrement the data pointer (to point to the next cell to the left)
  • "+" increment (increase by one) the byte at the data pointer
  • "-" decrement (decrease by one) the byte at the data pointer
  • "." output the byte at the data pointer
  • "," accept one byte of input, storing its value in the byte at the data pointer
  • "[" if the byte at the data pointer is zero, then instead of moving the instruction pointer forward to the next command, jump it forward to the command after the matching ] command
  • "]" if the byte at the data pointer is nonzero, then instead of moving the instruction pointer forward to the next command, jump it back to the command after the matching [ command

Reference Implementation

Before we can optimize anything, we need a reference implementation to compare against.

We can make an interpreter that will work exactly as described in the specification of the language.

The resulting code can be found here: ReferenceInterpreter.cs

Benchmark

To Benchmark we use some bf code that visualises a mandelbrot set in the console:

Mandelbrot generated in the console in 7,2 seconds

I have no idea how it works, and we'll get back to why that is an issue later.

With the reference implementation it runs for 2 minutes, 16 seconds. This is in release mode with all optimizations on, and no debugger attached. In debug mode it runs for about 20 minutes.

By improving the interpreter step by step, eventually we'll get down to 7,5 seconds (on my machine).

In these blogposts I want to document these incremental improvements.

Optimization 1: Jump Table

In the reference implementation, the loops '[' and ']' are implemented as follows:

case '[':
	var nesting = 0;
	if (memory[pointer] == 0)
	{
		while (cr.HasCharacters())
		{
			cr.Forward();
			var tempChar = cr.GetChar();
			if (tempChar == ']' && nesting == 0) { break; }
			if (tempChar == '[') { nesting++; }
			if (tempChar == ']') { nesting--; }
		}
	}
	break;
case ']':
	nesting = 0;
	if (memory[pointer] != 0)
	{
		while (true)
		{
			cr.Back();
			var tempChar = cr.GetChar();
			if (tempChar == '[' && nesting == 0) { cr.GetChar(); break; }
			if (tempChar == ']') { nesting++; }
			if (tempChar == '[') { nesting--; }
		}
	}
	break;

To jump from one bracket to the other we cycle through each character in the source code until we find the corresponding bracket.

So every time we reach the closing bracket ']', we cycle through all characters that came before until we reach the opening bracket. And all of this happens on every iteration.

This can be done more efficiently: If we can remember the location of the opening bracket, then we can instantly jump to that location instead of scanning through the code over and over again.

Stack

To solve this, we can use a Stack structure.

A stack, is a LIFO structure. "Last In, First Out". Much like a stack of books, you take off the last book you put on it first. You "push" some things on the stack. And you can take the last item that was pushed off the stack. This is called to "pop" it.

When the opening bracket '[' is encountered, its address is "pushed" to the stack.

When the closing bracket ']' is encountered the address is "popped" from the stack. And if needed, we jump straight to the location of the opening bracket.

This makes the jump instant.

Nested loops

The advantage of using a Stack here is that it makes it easy to handle nested loops, like this snippet of bf code:

[[>>>>>>>>>]+[<<<<<<<<<]>>>>>>>>>-]

Suppose the instruction starts at position 0 in the example above.

We encounter '[' and push "0" to the stack. The next character is the second '[' at position 1, so we push "1" to the stack.

Then we encounter the inner most closing bracket ']', and we know the address we "pop" will be "1", the corresponding opening bracket.

If we then reach the end of the outer loop, only position "0" will still be on the Stack, so we pop that one and can jump back to the beginning.

Results

With these changes, the mandelbrot set was generated in 1 minute, 6 seconds. Or 48% of our reference run time.

A significant win thanks to the Stack.

The new code can be found here: Optimization01.cs

In particular, the loop implementation now looks like this:

                    case '[':
                        var nesting = 0;
                        if (memory[pointer] == 0)
                        {
                            while (cr.HasCharacters())
                            {
                                cr.Forward();
                                var tempChar = cr.GetChar();
                                if (tempChar == ']' && nesting == 0) { break; }
                                if (tempChar == '[') { nesting++; }
                                if (tempChar == ']') { nesting--; }
                            }
                        } else
                        {
                            jumpTable.Push(cr.Position);
                        }
                        break;
                    case ']':
                        nesting = 0;
                        if (memory[pointer] != 0)
                        {
                            cr.Position = jumpTable.Peek();
                        } else
                        {
                            jumpTable.Pop();
                        }
                        break;

Note that when jumping back, the value on the stack is "peeked". This means we observe the value of the item on the stack, but we don't take it off the stack.

Because we need it again if we jump back to the beginning.

The value is only "popped" when it's no longer needed. But that's really an implementation detail.

Conclusion

Brainf*** is a super small language, making it easy to experiment with interpreters, without getting into the details of language parsing.

Using a stack structure helped us improve our execution speed by 52%.

In the next posts we'll explore the things that do - or do not - improve efficiency. Eventually we'll decrease the execution time by 94% compared to the reference implementation.