1. Introduction
When working with text data in Java, it’s common to encounter situations where we need to clean strings by removing non-alphabetic characters. This task can be essential for text processing, user input validation, and data sanitization.
In this tutorial, we’ll look at some of the most effective and straightforward methods to remove non-alphabetic characters from strings in an array. We’ll discuss techniques using regular expressions, the Stream API, StringBuilder, and libraries like Apache Commons Lang. Each method will be illustrated with simple code examples and tests.
2. Using Regular Expressions
Let’s start with regular expressions to remove all non-alphabetic characters from the String array. They let us define patterns for matching and replacing non-alphabetic characters in a string.
This approach is both flexible and easy to implement:
public static String removeNonAlphabeticUsingRegex(String input) {
return input.replaceAll("[^a-zA-Z]", "");
}
We use regex to identify and remove non-alphabetic characters from the input string. The key method here is replaceAll(), which takes two arguments. The first argument “[^a-zA-Z]” is a regex pattern that matches any character that isn’t a letter (uppercase or lowercase) while the second argument “” replaces those matched characters with an empty string, effectively removing them.
Next, let’s write the test:
@Test
void givenMixedString_whenRemoveNonAlphabeticUsingRegex_thenReturnsOnlyAlphabeticCharacters() {
String[] inputArray = { "Hello123", "Java@Code", "Stack#Overflow" };
String[] expectedArray = { "Hello", "JavaCode", "StackOverflow" };
for (int i = 0; i < inputArray.length; i++) {
assertEquals(expectedArray[i], StringManipulator.removeNonAlphabeticUsingRegex(inputArray[i]));
}
}
3. Using Character Filtering With Streams
Java Streams provides a functional programming approach for processing sequences of elements. We can filter characters to remove non-alphabetic ones efficiently:
public static String removeNonAlphabeticUsingStreams(String input) {
return input.chars()
.filter(Character::isLetter)
.mapToObj(c -> String.valueOf((char) c))
.collect(Collectors.joining());
}
Here, we use the Stream API to process the input string character by character. Essentially, input.chars(), converts the string into a stream of characters. Then, filter(Character::isLetter) filters out any character that isn’t a letter using the isLetter() method from the Character class. Next, mapToObj(c -> String.valueOf( (char) c ) converts the filtered characters back into their string form. Finally, Collectors. joining() collects all the filtered characters into a single string.
Let’s verify the approach with a test:
@Test
void givenMixedString_whenRemoveNonAlphabeticCharactersUsingStreams_thenReturnsOnlyAlphabeticCharacters() {
String[] inputArray = { "Stream123", "Functional!", "Lambda$Syntax" };
String[] expectedArray = { "Stream", "Functional", "LambdaSyntax" };
for (int i = 0; i < inputArray.length; i++) {
assertEquals(expectedArray[i], StringManipulator.removeNonAlphabeticUsingStreams(inputArray[i]));
}
}
4. Using StringBuilder
StringBuilder provides a mutable sequence of characters for efficient string construction. It’s useful for building strings while filtering characters manually:
public static String removeNonAlphabeticUsingStringBuilder(String input) {
StringBuilder sb = new StringBuilder();
for (char c : input.toCharArray()) {
if (Character.isLetter(c)) {
sb.append(c);
}
}
return sb.toString();
}
This method takes a manual approach to filtering characters using StringBuilder. Using StringBuilder is more efficient for string manipulation than repeatedly using the + operator.
First, we convert the input string into a character array using toCharArray(). Then we loop through each character and if the character is a letter, we append it to the StringBuilder using sb. append(c). Finally, we return the result as a string using sb.toString().
Let’s write a unit test to verify the method:
@Test
void givenMixedString_whenRemoveNonAlphabeticCharactersUsingStringBuilder_thenReturnsOnlyAlphabeticCharacters() {
String[] inputArray = { "Builder@Example", "Remove123Chars", "Efficient*Code" };
String[] expectedArray = { "BuilderExample", "RemoveChars", "EfficientCode" };
for (int i = 0; i < inputArray.length; i++) {
assertEquals(
expectedArray[i], StringManipulator.removeNonAlphabeticUsingStringBuilder(inputArray[i]));
}
}
5. Using Apache Commons Lang
Apache Commons Lang is a utility library providing useful methods for common tasks. It provides the StringUtils class with the replacePattern() method for regex-based string manipulation:
public static String removeNonAlphabeticWithApacheCommons(String input) {
return StringUtils.replacePattern(input, "[^a-zA-Z]", "");
}
We use the replacePatttern() method of Apache Commons Lang, which expects the input string, the regex pattern to look for, and the String to replace it with where we find the pattern. It identifies non-alphabetic characters using the regex pattern and replaces them with an empty string.
Finally, let’s test this approach:
@Test
void givenMixedString_whenRemoveNonAlphabeticCharactersUsingpacheCommons_thenReturnsOnlyAlphabeticCharacters() {
String[] inputArray = { "Commons123Lang", "Apache@Utility", "String#Utils" };
String[] expectedArray = { "CommonsLang", "ApacheUtility", "StringUtils" };
for (int i = 0; i < inputArray.length; i++) {
assertEquals(
expectedArray[i], StringManipulator.removeNonAlphabeticWithApacheCommons(inputArray[i]));
}
}
6. Conclusion
In this article, we quickly explored some options available to remove non-alphabetic characters from a string. These options include using regex, filtering with the Stream API, iterating over the string and using StringBuilder, and lastly, utilizing the Apache Commons Lang library.
As always, the full implementation of this article can be found over on GitHub.