1. Overview
In this tutorial, we'll see the benefits of pre-compile a regex pattern and the new methods introduced in Java 8 and 11.
This will not be a regex how-to, but we have an excellent Guide To Java Regular Expressions API for that purpose.
2. Benefits
Reuse inevitably brings performance gain, as we don't need to create and recreate instances of the same objects time after time. So, we can assume that reuse and performance are often linked.
Let's take a look at this principle as it pertains to Pattern#compile. We'll use a simple benchmark:
- We have a list with 5,000,000 numbers from 1 to 5,000,000
- Our regex will match even numbers
So, let's test parsing these numbers with the following Java regex expressions:
- String.matches(regex)
- Pattern.matches(regex, charSequence)
- Pattern.compile(regex).matcher(charSequence).matches()
- Pre-compiled regex with many calls to preCompiledPattern.matcher(value).matches()
- Pre-compiled regex with one Matcher instance and many calls to matcherFromPreCompiledPattern.reset(value).matches()
Actually, if we look at the String#matches‘s implementation:
public boolean matches(String regex) { return Pattern.matches(regex, this); }
And at Pattern#matches:
public static boolean matches(String regex, CharSequence input) { Pattern p = compile(regex); Matcher m = p.matcher(input); return m.matches(); }
Then, we can imagine that the first three expressions will perform similarly. That's because the first expression calls the second, and the second calls the third.
The second point is that these methods do not reuse the Pattern and Matcher instances created. And, as we'll see in the benchmark, this degrades performance by a factor of six:
@Benchmark public void matcherFromPreCompiledPatternResetMatches(Blackhole bh) { for (String value : values) { bh.consume(matcherFromPreCompiledPattern.reset(value).matches()); } } @Benchmark public void preCompiledPatternMatcherMatches(Blackhole bh) { for (String value : values) { bh.consume(preCompiledPattern.matcher(value).matches()); } } @Benchmark public void patternCompileMatcherMatches(Blackhole bh) { for (String value : values) { bh.consume(Pattern.compile(PATTERN).matcher(value).matches()); } } @Benchmark public void patternMatches(Blackhole bh) { for (String value : values) { bh.consume(Pattern.matches(PATTERN, value)); } } @Benchmark public void stringMatchs(Blackhole bh) { Instant start = Instant.now(); for (String value : values) { bh.consume(value.matches(PATTERN)); } }
Looking at the benchmark results, there's no doubt that pre-compiled Pattern and reused Matcher are the winners with a result of more than six times faster:
Benchmark Mode Cnt Score Error Units PatternPerformanceComparison.matcherFromPreCompiledPatternResetMatches avgt 20 278.732 ± 22.960 ms/op PatternPerformanceComparison.preCompiledPatternMatcherMatches avgt 20 500.393 ± 34.182 ms/op PatternPerformanceComparison.stringMatchs avgt 20 1433.099 ± 73.687 ms/op PatternPerformanceComparison.patternCompileMatcherMatches avgt 20 1774.429 ± 174.955 ms/op PatternPerformanceComparison.patternMatches avgt 20 1792.874 ± 130.213 ms/op
Beyond performance times, we also have the number of objects created:
- First three forms:
- 5,000,000 Pattern instances created
- 5,000,000 Matcher instances created
- preCompiledPattern.matcher(value).matches()
- 1 Pattern instance created
- 5,000,000 Matcher instances created
- matcherFromPreCompiledPattern.reset(value).matches()
- 1 Pattern instance created
- 1 Matcher instance created
So, instead of delegating our regex to String#matches or Pattern#matches that always will create the Pattern and Matcher instances. We should pre-compile our regex to earn performance and has fewer objects created.
To know more about performance in regex check out our Overview of Regular Expressions Performance in Java.
3. New Methods
Since the introduction of functional interfaces and streams, reuse has become easier.
The Pattern class has evolved in new Java versions to provide integration with streams and lambdas.
3.1. Java 8
Java 8 introduced two new methods: splitAsStream and asPredicate.
Let's look at some code for splitAsStream that creates a stream from the given input sequence around matches of the pattern:
@Test public void givenPreCompiledPattern_whenCallSplitAsStream_thenReturnArrayWithValuesSplitByThePattern() { Pattern splitPreCompiledPattern = Pattern.compile("__"); Stream<String> textSplitAsStream = splitPreCompiledPattern.splitAsStream("My_Name__is__Fabio_Silva"); String[] textSplit = textSplitAsStream.toArray(String[]::new); assertEquals("My_Name", textSplit[0]); assertEquals("is", textSplit[1]); assertEquals("Fabio_Silva", textSplit[2]); }
The asPredicate method creates a predicate that behaves as if it creates a matcher from the input sequence and then calls find:
string -> matcher(string).find();
Let's create a pattern that matches names from a list that have at least first and last names with at least three letters each:
@Test public void givenPreCompiledPattern_whenCallAsPredicate_thenReturnPredicateToFindThePatternInTheListElements() { List<String> namesToValidate = Arrays.asList("Fabio Silva", "Mr. Silva"); Pattern firstLastNamePreCompiledPattern = Pattern.compile("[a-zA-Z]{3,} [a-zA-Z]{3,}"); Predicate<String> patternsAsPredicate = firstLastNamePreCompiledPattern.asPredicate(); List<String> validNames = namesToValidate.stream() .filter(patternsAsPredicate) .collect(Collectors.toList()); assertEquals(1,validNames.size()); assertTrue(validNames.contains("Fabio Silva")); }
3.2. Java 11
Java 11 introduced the asMatchPredicate method that creates a predicate that behaves as if it creates a matcher from the input sequence and then calls matches:
string -> matcher(string).matches();
Let's create a pattern that matches names from a list that have only first and last name with at least three letters each:
@Test public void givenPreCompiledPattern_whenCallAsMatchPredicate_thenReturnMatchPredicateToMatchesThePatternInTheListElements() { List<String> namesToValidate = Arrays.asList("Fabio Silva", "Fabio Luis Silva"); Pattern firstLastNamePreCompiledPattern = Pattern.compile("[a-zA-Z]{3,} [a-zA-Z]{3,}"); Predicate<String> patternAsMatchPredicate = firstLastNamePreCompiledPattern.asMatchPredicate(); List<String> validatedNames = namesToValidate.stream() .filter(patternAsMatchPredicate) .collect(Collectors.toList()); assertTrue(validatedNames.contains("Fabio Silva")); assertFalse(validatedNames.contains("Fabio Luis Silva")); }
4. Conclusion
In this tutorial, we saw that the use of pre-compiled patterns brings us a far superior performance.
We also learned about three new methods introduced in JDK 8 and JDK 11 that make our lives easier.
The code for these examples is available over on GitHub in core-java-11 for the JDK 11 snippets and core-java-text for the others.