I, For One, Welcome Our New Automata Overlords

21 April 2015

Introduction

From what I've learned so far about compilers (which, I admit, I'm writing this blog post series to learn more about compilers), it appears that the text-processing components of a compiler are just a series of automatons slammed together to transform text that is somewhat human editable (in a legacy code base that isn't saying much), into an Abstract Syntax Tree that is much easier to do stuff with algorithmically.

I just realized that I used a bunch of words there that don't really mean very much on their own, so let's start with a definition.

Automaton

When speaking about Automatons with respect to Compilers, it seems the two types of automatons worth mentioning are the Deterministic Finite Automaton and the Non-Deterministic Finite Automaton. A term you may be more familiar with is "state machine" rather than "automaton". I certainly was more familiar with this term from university. I do feel, however, that the term "automaton" sounds much more impressive.

Deterministic Finite Automaton (DFA)

A deterministic finite automaton is probably the simpler of the concepts to get one's head around, and describes a state machine in which each node (state) has a single edge (transition) per symbol in the DFA language to another node. Let's look at an example:

DFA

Node 1 is the start state, as denoted by the "floating" arrow pointing to it from the top and node 1 is also the accepting state, as denoted by the double circle.

You can test a string against a DFA. If you consider the string "afg" and test it against the above DFA, you can see that it is matched, transitioning through the states: 1, 2 and 5 and being accepted at state 1.

If you were to test the string "af" against the above DFA, it would fail, as it would transition through the states 1 and 2 but finish in state 5, which is not an accepting state.

Likewise, if you were to test the string "ai" against the above DFA, it would also fail, as it would transition through state 1 then be unable to transition anywhere further because there is no transition from state 2 for the symbol "i".

You might be thinking by now that this sounds very familiar, and you would be correct. Deterministic and Non-Deterministic Finite Automata are used in the implementation of Regexes. Again, more on this in a future blog post.

Implementing a Deterministic Finite State Automaton

The book, Modern Compiler Implementation in Java, specifies a method of implementing a DFA using a transition table. This is the method I have chosen to use to implement my DFA in Scala.

A transition table for our DFA above would look something like this:

a b c d e f g h
0 0 0 0 0 0 0 0 0
1 2 3 4 0 0 0 0 0
2 0 0 0 0 0 5 0 0
3 0 0 0 5 0 0 0 0
4 0 0 0 0 5 0 0 0
5 0 0 0 0 0 0 1 2

I'll discuss in a later blog post what the zero row is for and how we handle things like accepting states, but for now, let's focus on the transition between states.

Consider the successful test before, where we tested the string "afg" against the DFA.

Looking at the above table, you would start in state 1 and get input 'a', if you look at column a and row 1 you'll see the value 2, which corresponds to the state transition that one would take from the DFA diagram above.

Continuing these steps until the string is fully consumed will yield the same results as following the state diagram above.

The Beginnings of an Implementation in Scala

One thing I've noticed with courses and books on compilers is that they tend to leave out the actual programming of the DFA algorithm, instead favoring more of a pen and paper approach. This makes sense considering there is a limited time in which one can take a university course and that there are other courses that teach such algorithms in more depth (at least that is what I have been told by friends, I don't think my software engineering degree had such a course available).

I seem to learn a lot better with my fingers at the keyboard rather than holding a pen, so I've decided to get my hands dirty and implement a (probably cut down version of, we'll see) DFA myself.

The following code in Scala starts to fill out the behavior that we want from our DFA.

package io.jacobappleton.compilers.chapters.two

class DFA(val states: Seq[Seq[Int]]) {

  def addTransition(from: Int, key: Char):DFA = {
    val initialDFA = new DFA(states ++ Seq(Seq.fill(alphabet.length)(0)))
    initialDFA.addTransition(from, key, initialDFA.states.length - 1)
  }

  def addTransition(from: Int, key: Char, to: Int) = {
    new DFA(states.updated(from, states(from).updated(key.asDigit, to)))
  }
}

object DFA {
  def apply() = {
      new DFA(Seq.fill(2)(Seq.fill(alphabet.length)(0)))
  }

  val alphabet = ('0' to 'z')
}

There are two methods currently in our DFA: both of which add a transition. One creates a transition to a new state, and the other adds a transition from one existing state to a new existing state.

The DFA companion object gives us a "static" apply method which sets us up with state 0 and an initial state (again, more on state 0 in another blog post) and defines our alphabet for us. Potentially in the future we will want to provide the consumer with the ability to set their own alphabet, but for now, while we're getting our heads around DFAs, this implementation is fine.

You'll also notice that the two methods in the DFA also return new DFAs. This is a pattern that is employed in .NET's IEnumerable extension methods, and allows us to "change" an object without ever actually mutating it's internal state. I'm not entirely sure if this is idiomatic Scala, or if there is a clearer way to do this. Feel free to drop me a message in the comments below if you know of a better way I could be writing this functionality in Scala.

Useful Links

If you'd like to learn more about DFAs and their implementations, the following links are helpful.

Up Next

Coming up, we'll take a look at the code above and start to talk about some concepts in Scala along with how to test this start at a DFA and talk about start and acceptance states.

Tags: Compilers | Automata | Scala
Tweet