Dec 22, 2011
Night #3: Regular Expressions in Processing
Several years ago I became somewhat obsessed with regular expressions while reading Jeffrey Friedl’s Mastering Regular Expressions. At the time, I wrote a short tutorial about regular expressions for my course Programming from A to Z. The sad truth is that if you’ve ever done regular expressions in Java, it’s pretty darn awkward compared to, say, python or perl. The good news is there are some nice regex helper functions in Processing that can make it a bit easier. Before we get to that let’s start with the Java API:
- Pattern — a compiled representation of a regular expression.
- Matcher — an engine that performs match operations on a character sequence (or String) by interpreting a Pattern.
An example of Pattern and Matcher in Java (which you can write directly into Processing) looks like the following:
Of course, in most cases, you want to do something more sophisticated where you iterate over many matches.
Processing provides some regex helper functions that wrap all of this Java Pattern/Matcher stuff. They are match() and matchAll().
The match() function is used to apply a regular expression to a piece of text, and return matching groups (elements found inside parentheses) as a String array. If there is no match, the function will return null. If no groups are specified in the regular expression, but the sequence matches, an array of length one (with the matched text as the first element of the array) will be returned.
Here’s an example (this is straight from the reference page).
The matchAll() function is at first a bit confusing because it returns a two dimensional array. But if you look right back to how match() works, it’s pretty simple. match() assumes you want just one match, and gives you an array, a list of all the groups for that single match. matchAll() assumes you want all the matches, so it gives you a bunch of those arrays, one for every match. What’s an array of an array? A two dimensional array! The first dimension is the match itself, and the second dimension is the group for that match, i.e.
This new example uses a regex that matches anything inside an HTML href tag and draws it the screen.