Automata V: The Epsilon Closure Strikes Back

09 July 2015

Introduction

From the previous post, we have a simple Regex language that we can parse into an NFA, and from an earlier post we have a DFA implementation that can be tested against a string. To have a working rudimentary Regex implementation, all that is required now is to now build something that can take an NFA, and convert it to a DFA.

In the next few blog posts, we'll cover how to do exactly that.

To get to that point though, we need to talk about a new concept that is used when undertaking the process of converting an NFA to a DFA; the epsilon-closure.

Epsilon Closure

Given an NFA, the epsilon-closure of a given state is the set of states that can be reached from that state making only epsilon (ε) moves.

Using the state transition diagram from the previous post for the Regex a*.b*, the epsilon closure for state 1 is shown below in orange:

a*.b*

Which is the set: {1, 2, 4, 5, 7}.

Note that 1 is included in the epsilon closure for state 1 and that holds true for all states. For example, state 2 has no epsilon moves, and yet its epsilon closure is the set {2}.

The Implementation

def epsilonClosure(s: Int): Seq[Int] = epsilonClosure(s, Seq[Int]())

def epsilonClosure(s: Int, seenBefore: Seq[Int]): Seq[Int] = {
    val epsilonMoves = states(s)(getAlphaIndex(epsilon)).filter(t => t != NFA.DeadState && t != s).diff(seenBefore)
    s +: epsilonMoves.flatMap(m => epsilonClosure(m, seenBefore ++ epsilonMoves)).distinct
}

If you recall from previous posts that our NFA is a sequence of sequences of sequences of integers, then you'll see that in the epsilon-closure algorithm we first take the current state states(s), before taking the sequence that corresponds to the epsilon moves for this state states(s)(getAlphaIndex(epsilon)). We then filter out the epsilon moves for the DeadState (that is, the state that represents that we have no more moves we can make) and the current state (so as not to introduce infinite loops). Finally, for each of the epsilon moves left, we recursively call ourselves again, adding the current sequence of epsilon moves to the sequence of states "seen before".

Flat Map

We use flatMap which will take a sequence of sequences and map each of the elements of each sub sequence into a single sequence. To understand how this might work, the following example (at the sbt console) is indicative:

scala> val s = Seq(Seq(1, 2, 3, 4, 5), Seq(6, 7, 8, 9, 10))
s: Seq[Seq[Int]] = List(List(1, 2, 3, 4, 5), List(6, 7, 8, 9, 10))

scala> s.flatMap(x => x)
res0: Seq[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

So, in our epsilon-closure algorithm, we call flatMap and pass a function that will compute the epsilon closure for each element of the sequence, which itself returns a sequence of integers, which will all be mapped down into a single sequence.

We also have a convenience version of the epsilonClosure function so that consumers don't have to pass in an empty sequence of already seen states to kick things off.

Some Examples

Sticking with the pattern I used in the previous post to look at some cool Regexes, let's see what epsilon-closures we get for interesting states in each of the following NFAs.

a.b|c.d

new Regex("a.b|c.d").toNFA().epsilonClosure(1) // Seq(1, 2, 5)
new Regex("a.b|c.d").toNFA().epsilonClosure(3) // Seq(3)
new Regex("a.b|c.d").toNFA().epsilonClosure(7) // Seq(7, 8)

a*.b*

new Regex("a*.b*").toNFA().epsilonClosure(1) //Seq(1, 2, 4, 5, 7)
new Regex("a*.b*").toNFA().epsilonClosure(2) //Seq(2)
new Regex("a*.b*").toNFA().epsilonClosure(3) //Seq(3, 4, 5, 7, 2)

a*|b*

new Regex("a*|b*").toNFA().epsilonClosure(1) //Seq(1, 2, 3, 5, 10, 6, 7, 9)
new Regex("a*|b*").toNFA().epsilonClosure(7) //Seq(7)
new Regex("a*|b*").toNFA().epsilonClosure(5) //Seq(5, 10)

Conclusion

Now that we have an algorithm to find the epsilon-closure of a state in the NFA, we can use that to transform our NFA to a DFA, but that is a topic for the next blog post, so stay tuned.

Tags: Compilers | Automata | Scala | Regex
Tweet