Automata VI: Making Testing Easier

28 July 2015

Introduction

Despite being quite the annoying, vocal, automated test evangelist (just ask the people who have to put up with me at work every day), I've been a bit slack with setting up tests during the process of building a Regex engine in previous blog posts.

Today I want to correct that.

The reason why I have found it difficult to really keep my discipline and actually have a decent suite of automated tests, is that NFAs and DFAs are basically large two or three dimensional tables full of state transitions, and writing tests requires tedious manual reconciliation of these tables with what I expect they should be in Regex patterns that grow ever more complex as we want to test more interesting scenarios.

One could argue that this is a very good reason for more tests, and, if I were writing production code that was going to be used by a business, I would completely agree with them (please, please, please never use this code in anything that is important - I'm really only playing around and experimenting), but this code is part of a learning process and as such, I'm not super worried about it being 100% correct. I'm much more worried about it being indicative of the concepts I'm trying to learn (and teach, at the same time).

Instead of True Automated Tests

To make life easier on myself, I first took advantage of GraphViz to build state-transition diagrams, which I would then compare visually to ensure that the NFAs and DFAs were still producing the correct transition table as I continued to modify the way that they worked.

This was, however, rather naughty of me, and I was very much falling into the trap that a lot of us (developers) make when thinking about testing. As the set of possible things that require tests increased, I was less and less able to actually keep track of the state of each manual test. This meant that I was never verifying that all of the tests passed as I was writing the code.

Using ASCII-Art to Visualize an NFA

When I think of an NFA, I conceptualize a table of state transitions, so when testing, ideally I would just compare the table that exists in the form of a three-dimensional sequence with the table I conceptualize in my head.

To do this, I needed a way to output an NFA as a table. Thus I came up with this somewhat complex algorithm to take an NFA state transition three-dimensional sequence and produce an ASCII-art (and Markdown) transition table.

def toTable() = {
  val lengthOfLargestStateLabel = nfa.states.length.toString.length

  def printSet(s: Seq[Int]) = " { " + s.mkString(", ") + " }"
  def getMaxPrintedSetLength(character: Int) = nfa.states.map(x => x(character)).map(printSet).map(_.length).max
  def getExtraTransitionSpacesToPrint(a: (Seq[Int], Int)) = " " * (getMaxPrintedSetLength(a._2) - printSet(a._1).length)
  def printStateLabel(state: Int) = state + (" " * (lengthOfLargestStateLabel + 2 - state.toString.length))

  val titleLine = "|   " + (" " * lengthOfLargestStateLabel) + "|" +
    nfa.alpha.zipWithIndex.map(a => " " + a._1 + (" " * (getMaxPrintedSetLength(a._2) - 1)) + "|").mkString + "\n"

  val titleBorder = "|---" + (" " * lengthOfLargestStateLabel) + "|" +
    nfa.alpha.zipWithIndex.map(a => "--- " + (" " * (getMaxPrintedSetLength(a._2) - 3)) + "|").mkString + "\n"

  val detailRows = nfa.states.zipWithIndex.map(s => "| " + printStateLabel(s._2) + "|" +
    s._1.zipWithIndex.map(a => printSet(a._1) + getExtraTransitionSpacesToPrint(a) + " |").mkString).mkString("\n")

  titleLine + titleBorder + detailRows
}

The printSet function is where it all gets started, and basically prints the target states of a particular state and character transition, inside curly braces {}, separated by a comma and space - in the format we're used to when describing sets.

We then have the getMaxPrintedSetLength function, which takes a character from our alphabet and looks at all printed transition sets for that character and finds the largest. This means we can ensure that our row length is correct for columns with multiple target transitions.

getExtraTransitionSpacesToPrint takes this length and prints enough space characters to ensure our column borders line up.

Finally we have the printStateLabel function which prints the numeric label for each state, making sure to handle state labels with multiple characters.

We then compose a titleLine containing each of the characters in our alphabet, titleBorder which gives us a MarkDown table row border line and detailRows which are the rows which actually define our transitions.

The transition table for a simple union NFA (a|b) which has a state diagram:

a|b

Looks like this:

| | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z | ε | |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- | | 0 | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | | 1 | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 2, 4 } | | 2 | { 3 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | | 3 | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 6 } | | 4 | { 0 } | { 5 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | | 5 | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 6 } | | 6 | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } |

Which is much easier to compare in a test:

it should "generate a NFA for a simple union of two characters" in {
  val nfa = new Regex("a|b").toNFA()

  val printer = new NFAPrinter(nfa)

  val expectedTable =
    "|    | a     | b     | c     | d     | e     | f     | g     | h     | i     | j     | k     | l     | m     | n     | o     | p     | q     | r     | s     | t     | u     | v     | w     | x     | y     | z     | ε        |\n" +
    "|--- |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---    |---       |\n" +
    "| 0  | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 }    |\n" +
    "| 1  | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 2, 4 } |\n" +
    "| 2  | { 3 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 }    |\n" +
    "| 3  | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 6 }    |\n" +
    "| 4  | { 0 } | { 5 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 }    |\n" +
    "| 5  | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 6 }    |\n" +
    "| 6  | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 } | { 0 }    |"

  assert(printer.toTable == expectedTable)
  // Generate the GraphViz output as well for manual debugging of test failure
  printer.outputGraphViz("output/union_nfa.dot")
  "dot -T png output/union_nfa.dot -o output/union_nfa.png" !
}

Doing the Same Thing for DFAs

Doing the same thing for a DFA is quite a bit easier, as we immediately know the length of each set of transition states; one. This allows us to remove all of the logic from the printer that worries about how many spaces to include in order to pretty print our tables.

def toTable() = {
  val lengthOfLargestStateLabel = dfa.states.length.toString.length

  def printStateLabel(state: Int) = state + (" " * (lengthOfLargestStateLabel + 2 - state.toString.length))

  val titleLine = "|   " + (" " * lengthOfLargestStateLabel) + "|" + dfa.alpha.zipWithIndex.map(a => " " + a._1 + "  |").mkString + "\n"
  val titleBorder = "|---" + (" " * lengthOfLargestStateLabel) + "|" + dfa.alpha.zipWithIndex.map(a => "--- " + "|").mkString + "\n"
  val detailRows = dfa.states.zipWithIndex.map(s => "| " + printStateLabel(s._2) + "|" + s._1.zipWithIndex.map(a => a._1 +  "   |").mkString).mkString("\n")

  titleLine + titleBorder + detailRows
}

The transition table for a DFA for the simple Regex pattern a|b Regex pattern looks like this:

| | a | b | c | d | e | f | g | h | i | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z | |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- | | 0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 | | 1 |2 |3 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 | | 2 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 | | 3 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |0 |

Which again, is much easier to use in a test:

it should "generate a DFA for a simple union of two characters" in {
   val dfa = new Regex("a|b").toNFA().toDFA()
   val printer = new DFAPrinter(dfa)
   val expectedTable =
     "|    | a  | b  | c  | d  | e  | f  | g  | h  | i  | j  | k  | l  | m  | n  | o  | p  | q  | r  | s  | t  | u  | v  | w  | x  | y  | z  |\n" +
     "|--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |\n" +
     "| 0  |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |\n" +
     "| 1  |2   |3   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |\n" +
     "| 2  |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |\n" +
     "| 3  |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |0   |"
   assert(printer.toTable == expectedTable)
   printer.outputGraphViz("output/union_dfa.dot")
   "dot -T png output/union_dfa.dot -o output/union_dfa.png" !
 }

In Conclusion

I'm now going through the process of taking a bunch of interesting Regex patterns and turning them into these transition table based unit tests. This puts us in a good place for future blog posts, where I will start to expand upon and optimise this rudimentary Regex engine.

Tags: Compilers | Automata | Scala
Tweet