Automata II: Start State, Accepting States and Visualisation

29 April 2015

Introduction

In the last post on automata, I briefly discussed the definition of Deterministic Finite Automata and used a transition table to implement the most important piece of the DFA data structure - the transitions. In this post, I'll finish off the implementation of the Deterministic Finite Automata, by implementing the start state, a collection of accepting states and by visualising the DFA using GraphViz.

The Code

I'm going to do something a little strange this time, and show the code first. As I started adding features to the implementation, it became abundantly clear that the implementation I originally posted could be improved in many ways. I guess that's the whole point of learning in the open, I'm allowed to write terrible Scala, and later on admit to it and make it better. :)

package io.jacobappleton.compilers.chapters.two

import java.io.PrintWriter

class DFA[T](alphabet: Seq[T], val states: Seq[Seq[Int]], val acceptanceStates: Seq[Int], val startState: Int) {

  def this(alphabet: Seq[T]) {
    this(alphabet, Seq.fill(2)(Seq.fill(alphabet.length)(0)), Seq.empty, DFA.DefaultStartState)
  }

  def this(alphabet: Seq[T], startState: Int) {
    this(alphabet, Seq.fill(2)(Seq.fill(alphabet.length)(0)), Seq.empty, startState)
  }

  def addTransition(from: Int, key: T): DFA[T] = {
    val newStates = states ++ Seq(Seq.fill(alphabet.length)(DFA.DeadState))
    val result = new DFA(alphabet, newStates, acceptanceStates, startState)
    val firstLiveStateOfResult = result.states.length -1
    result.addTransition(from, key, firstLiveStateOfResult)
  }

  def addTransition(from: Int, key: T, to: Int): DFA[T] = {
    val keyIdx = alphabet.zipWithIndex.filter(a => a._1 == key)(DFA.DeadState)._2
    val newStates = states.updated(from, states(from).updated(keyIdx, to))
    new DFA(alphabet, newStates, acceptanceStates, startState)
  }

  def addAcceptingState(state: Int) = new DFA(alphabet, states, acceptanceStates :+ state, startState)

  def toGraphViz(path: String) {
    val out = new PrintWriter(path)
    out.print(toString())
    out.close()
  }

  override def toString() = {
    def printTransition(state1: Int, transitionIdx: Int, state2: Int) = {
      val transitionValue = alphabet(transitionIdx)
      if (state2 != DFA.DeadState) s"s$state1 -> s$state2 [label = $transitionValue]"
      else ""
    }
    def getShape(node: Int) = if(acceptanceStates.contains(node)) "doublecircle" else "circle"
    def printNode(node: Int) = "node [shape = " + getShape(node) + "] s" + node + ";"

    val config = "\n\trankdir=LR;\n\tsize=8.5;\n\t\n\t"
    val stateList = "node [shape = point] s0\n\t" + states.zipWithIndex.filter(_._2 != 0).map(s => printNode(s._2)).mkString("\n\t")
    val startNode = s"s0 -> s$startState"
    val otherNodes = states.zipWithIndex.flatMap(s => s._1.zipWithIndex.map(e => printTransition(s._2, e._2, e._1)))
    val otherNodesString = otherNodes.filter(_ != "").mkString(";\n\t")

    s"digraph nfa { $config\n\t$stateList\n\t\n\t$startNode\n\t$otherNodesString;\n}"
  }
}

object DFA {
  val DeadState = 0
  val DefaultStartState = 1
}

Woah! A lot has happened since last time. Let's look into the changes I've made.

Firstly, as I started to implement the acceptance states and the visualisation, I realised that I was again using the inherent properties of the English alphabet for the logic of my DFA. This is fine when the DFA is used for testing against strings written in English, but not particularly good when, as I suspect we'll see later on, we need to use that same DFA with a different alphabet.

Therefore, the DFA is now generic. So far I'm using generics in a relatively straight forward way. The DFA now has a type-parameter of T that can be used to change the datatype of the alphabet we're dealing with. The alphabet itself is now passed into the constructors of the DFA.

You'll also notice that there are a couple of functions defined with the name this. These functions are secondary constructors for the DFA class, and allow consumers to construct the class with less boilerplate and more smart defaults.

Finally I've done a bit of tidying up around naming, using intermediate variables to better explain what the code is supposed to do. I found myself reaching for the

// comment

keys and decided beter of it, channelling my inner Uncle Bob and rewriting my code to be clearer instead.

Start State

Now let's talk about what I changed functionally about the code.

The start state, as the name implies, specifies in which state use of the DFA starts.

You'll remember in the last post on automata, the start state was shown by the "floating" edge pointing into the DFA graph.

The implementation of the start state was probably the simplest of the changes that I made as it's really just record keeping for the time being. Eventually we'll write a function that takes the DFA and tests a string (not just of characters) against it, and that will need to know about the start state, but otherwise it doesn't really do anything special. Therefore, I just added a value primary constructor parameter to the class definition.

Acceptance States

The acceptance states are also just record keeping for the time being, and as such I provided a value constructor parameter to pass in the acceptance states when constructing the DFA. Because one will most likely not know one's acceptance states when one actually builds the DFA, I've also added a function which allows the consumer to add new acceptance states as they go.

Following the same pattern I'm using throughout the code, adding an acceptance state doesn't mutate the internal state of the DFA, but rather returns a new DFA with the new acceptance state in it's collection of acceptance states.

Visualisation

While I have an image in my head of what the DFA will look like as I start adding states and transitions to it, testing the implemented code at the moment is a little painful, as I have to read or assert against the output table, translating the sequences to a state diagram in my head.

Instead of doing that translation myself, every time I update the implementation of the DFA, I decided to let the computer do what it's good at and have it translate the DFA generated into a state diagram for us.

GraphViz

Enter GraphViz; an open source graph visualisation tool. I haven't really thought of GraphViz since university, but for our current implementation, this tool makes a lot of sense.

GraphViz uses a .dot file to build graph diagrams, so let's look at the changes I made to the DFA to output it's state to a .dot file.

Again I use some intermediate variables and functions to clean up the code, but the key interesting pieces of the implementation in my mind are the flatMap and zipWithIndex function usages.

The flatMap function works like a regular map function, except if the result of a regular map would be a sequence of sequences of type T, flatMap will flatten these sequences into a single sequence of T.

Let's look at a short example:

val s = Vector(Vector(1, 2, 3, 4, 5), Vector(6, 7, 8, 9, 10))
s.map(_.map(_ + 5)) // Vector(Vector(6, 7, 8, 9, 10), Seq(11, 12, 13, 14, 15))
s.flatMap(_.map(_ + 5)) //Vector(6, 7, 8, 9, 10, 11, 12, 13, 14, 15)

The zipWithIndex function allows us to take a sequence, which is an abstraction over a collection of items (which may or may not ) and map it to a collection which contains the index of the element and it's value.

Again, an example is probably the easiest way to show this:

val s = 'a' to 'j'
s.zipWithIndex // Vector((a,0), (b,1), (c,2), (d,3), (e,4), (f,5), (g,6), (h,7), (i,8), (j,9))

The zipWithIndex makes it easier to work with collections that have information encoded in the position of the elements within it in a functional way.

The Result

Now, with the GraphViz output code written, I created an automated test to generate a diagram each time that I update the code.

package io.jacobappleton.compilers.chapters.two

import scala.language.postfixOps
import sys.process._
import org.scalatest.FlatSpec

class DFATest extends FlatSpec {
  it should " handle multiple transitions from a single state " in {
    val d = new DFA('a' to 'z', 1).addTransition(1, 'a').addTransition(1, 'b').addTransition(1, 'c').addTransition(3, 'd').addTransition(4, 'e', 5)
           .addTransition(2, 'f', 5).addTransition(5, 'g', 1).addTransition(5, 'h', 2).addAcceptingState(5).addAcceptingState(2)
    d.toGraphViz("multi_dfa.dot")
    "dot -T png multi_dfa.dot -o multi_dfa.png" !
  }
}

Obviously I would recommend many more automated tests for this code, but considering this is non-production, experimental code, in my opinion, this is enough for now.

The test code generates the following .dot file.

digraph nfa { 
  rankdir=LR;
  size=8.5;


  node [shape = point] s0
  node [shape = circle] s1;
  node [shape = doublecircle] s2;
  node [shape = circle] s3;
  node [shape = circle] s4;
  node [shape = doublecircle] s5;

  s0 -> s1
  s1 -> s2 [label = a];
  s1 -> s3 [label = b];
  s1 -> s4 [label = c];
  s2 -> s5 [label = f];
  s3 -> s5 [label = d];
  s4 -> s5 [label = e];
  s5 -> s1 [label = g];
  s5 -> s2 [label = h];
}

Which is converted to the following diagram.

DFA

In Conclusion

I'm not super happy with the toString implementation, which can probably be improved to be a bit clearer. As I learn more about Scala I'll probably come back to this code with a future blog post and improve it.

There is still a lot to do. In future posts we can look forward to working with Regexes converted to NFAs converted to DFAs. I still have quite a bit of research to do before we get to that point though, so stay tuned.

Tags: Compilers | Automata | Scala
Tweet