
1. Introduction
In this tutorial, we’ll learn how to create a new list from an existing one in Java by filtering the elements that match a regular expression (regex). We’ll also look at various ways to filter the list using regex in Java.
2. Regex Overview
Regex are patterns used to match specific sequences of characters within a string. They’re incredibly versatile tools, enabling us to filter, manipulate, replace, and validate text.
Java provides a rich set of regular expression features through the java.util.regex package.
2.1. Common Special Characters
We use a set of special characters to build patterns for matching text. By combining these characters as a string, we create regular expressions:
“.”: Matches any character (except a newline)
“*”: Matches all the occurrences (zero or more) of the character which are the same as the previous character, i.e., before “*”
“+”: Matches all the occurrences (one or more) of the character which are the same as the previous character, i.e., before “+”
“?”: Matches zero or one occurrence of the previous character
“^”: Matches the start of a string
“$”: Matches the end of a string
“[]”: Matches any one of the characters present between the brackets, e.g., “[abc]” matches “a”, “b”, or “c”
“|”: OR operation, e.g., “a|b” i.e.“a” or “b”
“()”: Used for grouping
2.2. Common Regex Shorthand Notations
To introduce escaping special character constructs, a backslash “\” is used in Java. So, we use two backslashes so that one backslash is interpreted properly.
Following are shorthand notations for commonly used patterns:
\\d: Matches a digit ([0-9])
\\w: Matches a word character ([a-zA-Z_0-9])
\\s: Matches a whitespace character (space, tab, newline)
\\D: Matches any non-digit character
\\W: Matches any non-word character
\\S: Matches any non-whitespace character
3. Different Ways to Filter a List in Java Using a Regex
A regex in the form of a string is internally compiled into a deterministic finite automaton (DFA) or nondeterministic finite automaton (NFA). The matcher uses this state machine to traverse and match the input string.
3.1. Using Stream API With Pattern and Predicate
The Java Stream API provides a convenient way to filter lists, and we can combine it with Pattern.compile() class to apply regex filtering:
List<String> filterUsingPatternAndPredicate() {
List<String> fruits = List.of("apple", "banana", "cherry", "apricot", "avocado");
Pattern pattern = Pattern.compile("^a.*");
return fruits.stream()
.filter(pattern.asPredicate()).toList();
}
The filter selects a string beginning with “a”, returning the output: [apple, apricot, avocado].
3.2. Using String.matches() Method
Let’s use the String.matches() method, which matches the entire string and returns true, otherwise, it returns false:
List<String> filterUsingStringMatches() {
List<String> list = List.of("123", "abc", "456def", "789", "xyz");
return list.stream()
.filter(str -> str.matches("\\d+")).toList();
}
It creates a new list which has one or more digits. The resultant list from the above code will be: [123, 789].
3.3. Using Pattern.compile() in Combination With a Loop
If we prefer not to use the Stream API (i.e., JDK version <8), we can use a loop and the Pattern.matcher() method:
List<String> filterUsingPatternCompile() {
List<String> numbers = List.of("one", "two", "three", "four", "five");
List<String> startWithTList = new ArrayList<>();
Pattern pattern = Pattern.compile("^t.*");
for (String item : numbers) {
Matcher matcher = pattern.matcher(item);
if (matcher.matches()) {
startWithTList.add(item);
}
}
return startWithTList;
}
The above code creates a new list with strings that start with “t”. The expected result will be: [two, three].
3.4. Using Collectors.partitioningBy() for Conditional Grouping
We can also use the Stream API, with a Pattern.compile() method to filter elements and make two lists conditionally:
Map<Boolean, List<String>> filterUsingCollectorsPartitioningBy() {
List<String> fruits = List.of("apple", "banana", "apricot", "berry");
Pattern pattern = Pattern.compile("^a.*");
return fruits.stream()
.collect(Collectors.partitioningBy(pattern.asPredicate()));
}
This code again filters the element that starts with “a” and the expected result is:
Matches(key=true): [apple, apricot]
Non-Matches(key=false): [banana, berry]
4. Conclusion
In this article, we saw several techniques for filtering a list using regular expressions. Among the several options we saw, using the Stream API stands out for its readability and concise syntax.
Additionally, combining it with Pattern and Predicate proves to be highly efficient, especially when dealing with larger datasets. This is because the Pattern is compiled only once and then reused, saving processing time.
Moreover, the Stream API performs exceptionally well, allowing us to chain multiple operations seamlessly. Of course, other methods can be employed depending on the specific requirements of your situation, but the Stream API often strikes the perfect balance between clarity and performance.
As always, the source code for the article is available over on GitHub.
The post Filtering a List With Regular Expressions in Java first appeared on Baeldung.