Regex V - Adding Support For Parentheses in Regex Patterns

09 August 2015

Introduction

It's been a while since my last post, a few friends have been getting married, so my normal pattern of working on code and blog posts over the weekends has taken a backseat to bachelor parties (we call them "bucks parties" in Australia - something everyone here in North America seems to find really strange) and weddings. In order to keep the momentum going, I'm sneaking a short one in today.

So far we've got a working rudimentary Regex engine, but we're quite limited in what we can actually test for. In order to give ourselves a bit more flexibility in what we can test, today we're going to add support for parentheses.

Updating the Shunting Yard Algorithm

To include support for parentheses, we need to update our Shunting Yard algorithm a little, to include support for the ( and ) characters.

For a reminder of how the Shunting Yard algorithm works, check out my previous post on the topic.

Here we've updated the algorithm to push the ( character on the operator stack when it's found and, once we find a ) character, to pop any remaining operators off the operator stack, and append them to the output. Finally, we've updated our precedence check to ignore the ( character if it's on top of the operator stack.

I also wrote some more tests to quickly verify some of the Shunting Yard and subsequent Regex behavior:

and some Regex tests as well:

produces:

a.b|c.d

parentheses_non_paren

a.(b|c).d

parentheses

produces:

j.a.((c|x)|(k|y)).o.b

nested_parentheses

Conclusion

As I said, today's post was going to be a quick one, just to keep the momentum going and keep moving on the Regex implementation. Stay tuned for more improvements to the Regex engine and moving further along the path of learning to build a compiler in Scala.

Tags: Compilers | Automata | Scala | Regex
Tweet