Automata III: Testing a String Against the DFA

23 May 2015

Introduction

Let me start off by apologizing for the relatively long time between posts. I've been holidaying through Europe with some friends and while I initially thought this would provide valuable time for blogging, it appears I was a bit naive and there was almost no time for blogging. That's OK though, now that I'm back home in Vancouver, it's time to get back on the wagon and get back to blogging.

Today we're going to look at testing a string against a DFA. In the last post on automata, I tidied up the DFA code, added acceptance states, added the start state and added the ability to visualize the DFA. All that work got us to the point where we can actually test a string (not necessarily just a string of characters, but for the sake of demonstration, I'll be showing that) against our DFA.

The DFA

To begin, let's create a DFA that matches the string "Test" where we could have any number of s'. You could consider a regex for this DFA as tes+t. An example of this regex can be found here: http://refiddle.com/2blq.

The code to create the DFA, based on previous blog posts, is:

val d = new DFA('a' to 'z', 1).addTransition(1, 't').addTransition(2, 'e').addTransition(3, 's').addTransition(4, 't').addTransition(4, 's', 3).addAcceptingState(5)

The DFA produced by this code looks like:

DFA

Matching a String

To match a string, the algorithm needs to start at the start state and the beginning of the string. From there, it must check the current element of the string against the current state. If there is a state transition to anything other than the dead (0) state, then move to that state and advance the string element pointer.

Because we're using a functional language to implement this algorithm, I decided to use recursion to perform this operation. Let's look at the code I produced for this.

private def getAlphabetIndex(item: T) = alphabet.zipWithIndex.filter(a => a._1 == item)(0)._2

def testString(subject: Seq[T]): Boolean = testString(subject, 0, 1)

private def testString(subject: Seq[T], subjectIndex: Int, state: Int): Boolean = {
if(subjectIndex == subject.length) {
  return acceptanceStates contains state
} else {
  val nextState = states(state)(getAlphabetIndex(subject(subjectIndex)))
  if(nextState == DFA.DeadState) {
    return false
  } else {
    testString(subject, subjectIndex + 1, nextState)
  }
}
}

Immediately, you'll see that I have extracted a method to get the index of an element from the alphabet, as it is being used in two places now. I've also fixed a small bug in which I conflated the zeroth index with the DFA dead state.

The method testString has two overloads, the first is the main entry point into the method, which allows the consumer to use this method in an object oriented fashion. The second, private, overload recursively checks for four possible conditions:

  • We're at the end of the string and aren't in an acceptance state, in which case, the DFA does not match the string. Return false.
  • We're at the end of the string and are in an acceptance state, in which case, the DFA does match the string. Return true.
  • We're not at the end of the string and the next state to transition to is the dead state, in which case, there are no more transitions and the DFA does not match the string. Return false.
  • We're not at the end of the string and the next state transition is not to the dead state, in which case, we don't yet know if the DFA matches the string. Recursively call testString incrementing the string index and moving along to the next state.

In Conclusion

This is a quick post today, as this is a quick win, getting me back into blogging and coding after a few weeks off. In a true Regex, you would want more than just testing whether a string matches, and potentially in the future this is also something we're going to require to build the rest of our compiler. For the time being though, this is enough to show the concept of how a DFA can match a string, and clears the path for the next consideration; the DFA's cousin, the Non-Deterministic Finite Automaton (NFA).

That will come with the next blog post though, so stay tuned.

Tags: Compilers | Automata | Scala
Tweet