Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 3550

Introduction to JavaParser

$
0
0

1. Introduction

In this article, we’re going to have a look at the JavaParser library. We’ll see what it is, what we can do with it, and how to use it.

2. What Is JavaParser?

JavaParser is an open-source library for working with Java sources. It allows us to parse Java source code into an abstract syntax tree (AST). Once we’ve done this, we can analyze the parsed code, manipulate it, and even write new code.

Using JavaParser, we can parse source code written in Java up to Java 18. This includes all stable language features but may not include any preview features.

3. Dependencies

Before we can use JavaParser, we need to include the latest version in our build, which is 3.25.10 at the time of writing.

The main dependency that we need to include is javaparser-core. If we’re using Maven, we can include this dependency in our pom.xml file:

<dependency>
    <groupId>com.github.javaparser</groupId>
    <artifactId>javaparser-core</artifactId>
    <version>3.25.10</version>
</dependency>

Or if we’re using Gradle, we can include it in our build.gradle file:

implementation("com.github.javaparser:javaparser-core:3.25.10")

At this point, we’re ready to start using it in our application.

Two additional dependencies are available as well. The dependency com.github.javaparser:javaparser-symbol-solver-core provides a means to analyze the parsed AST to find the relationships between Java elements and their declarations. The dependency com.github.javaparser:javaparser-core-serialization provides a means to serialize the parsed AST to and from JSON.

4. Parsing Java Code

Once we have the dependencies set up in our application, we’re ready to go. Parsing of Java code always starts with the StaticJavaParser class. This gives us several different mechanisms for parsing code, depending on what we’re parsing and where it’s coming from.

4.1. Parsing Source Files

The first thing we’ll look at parsing is the entire source files. We can do this with the StaticJavaParser.parse() method. Several overloaded alternatives allow us to provide the source code in different ways – directly as a string, as a File on the local filesystem, or as an InputStream or Reader for some resource. All of these work the same way and are simply convenient ways to provide the code to be parsed.

Let’s see it in action. Here, we’ll attempt to parse the provided source code and generate a CompilationUnit as a result:

CompilationUnit parsed = StaticJavaParser.parse("class TestClass {}");

This represents our AST and lets us inspect and manipulate the parsed code.

4.2. Parsing Statements

Individual statements are at the other end of the spectrum of the code that we can parse. We do this using the StaticJavaParser.parseStatement() method. Unlike with source files, there’s only a single version of this that takes a single string containing the statement to parse.

This method returns a Statement object that represents the parsed statement:

Statement parsed = StaticJavaParser.parseStatement("final int answer = 42;");

4.3. Parsing Other Constructs

JavaParser can also parse many other constructs, covering the entire Java language up to Java 18. Each construct has a separate, dedicated parse method and returns an appropriate type representing the parsed code. For example, we can use parseAnnotation() for parsing annotations, parseImport() for import statements, parseBlock() for parsing blocks of statements, and many more.

Internally, JavaParser will use the exact same code for parsing the various parts of our code. For example, when parsing a block using parseBlock(), JavaParser will ultimately end up in the same code as is called directly by parseStatement(). This means we can rely on these different parsing methods working the same for the same subsets of code.

We do need to know exactly what type of code we’re parsing in order to select the correct parsing method. For example, using the parseStatement() method to parse a class definition will fail.

4.4. Malformed Code

If parsing fails, the JavaParser will throw a ParseProblemException indicating exactly what was wrong with the code. For example, if we attempt to parse a malformed class definition, then we’ll get something like:

ParseProblemException parseProblemException = assertThrows(ParseProblemException.class,
    () -> StaticJavaParser.parse("class TestClass"));
assertEquals(1, parseProblemException.getProblems().size());
assertEquals("Parse error. Found <EOF>, expected one of  \"<\" \"extends\" \"implements\" \"permits\" \"{\"", 
    parseProblemException.getProblems().get(0).getMessage());

We can see from this error message that the problem is that the class definition is wrong. In Java, such a statement must be followed by either a “<“ – for a generic definition, the extends or implements keyword, or else a “{“ to start the actual body of the class.

5. Analyzing Parsed Code

Once we’ve parsed some code, we can start analyzing it to learn from it. This is similar to reflection within a running application, only on the parsed source code instead of the currently running code.

5.1. Accessing Parsed Elements

Once we’ve parsed some source code, we can query the AST to access individual elements. Exactly how we do this varies depending on the elements we want to access and what we’ve parsed.

For example, if we’ve parsed a source file into a CompilationUnit, then we can access a class that we expect to be present using getClassByName():

Optional<ClassOrInterfaceDeclaration> cls = compilationUnit.getClassByName("TestClass");

Note that this returns an Optional<ClassOrInterfaceDeclaration>. Optional is used because we can’t guarantee the type is present in this compilation unit. In other cases, we might be able to guarantee the presence of elements. For example, a class will always have a name, so ClassOrInterfaceDeclaration.getName() doesn’t need to return an Optional.

At every stage, we can only directly access elements that are at the outermost level of what we’re currently working with. For example, if we’ve got a CompilationUnit from parsing a source file, then we can access the package declaration, the import statements, and the top-level types, but we can’t access the members within those types. However, once we access one of these types, we can then access the members within it.

5.2. Iterating Parsed Elements

In some cases, we may not know exactly what elements are present in our parsed code, or else we simply want to work with all of a certain type of element instead of only one.

Each of our AST types can access an entire range of appropriate nested elements. Exactly how this works depends on what we want to work with. For example, we can extract all of the import statements out of a CompilationUnit using:

NodeList<ImportDeclaration> imports = compilationUnit.getImports();

No Optional is needed, as this is guaranteed to return a result. However, if no imports were present, this result might be an empty list.

Once we’ve done this, we can treat this as any collection. The NodeList type implements java.util.List correctly so we can work with it exactly as any other list.

5.3. Iterating the Entire AST

In addition to extracting exactly one type of element from our parsed code, we can also iterate over the entire parsed tree. All AST types from JavaParser implement the visitor pattern, allowing us to visit every element in our parsed source code with a custom visitor:

compilationUnit.accept(visitor, arg);

There are then two standard types of visitors that we can use with this. Both of these have a visit() method for each possible AST type, which takes a state argument that’s passed into the accept() call.

The simplest of these is VoidVisitor<A>. This has one method for every AST type and no return values. We then have an adapter type – VoidVisitorAdapter – that gives us a standard implementation to help ensure that the entire tree is correctly called.

We then only need to implement the methods that we’re interested in – for example:

compilationUnit.accept(new VoidVisitorAdapter<Object>() {
    @Override
    public void visit(MethodDeclaration n, Object arg) {
        super.visit(n, arg);
        System.out.println("Method: " + n.getName());
    }
}, null);

This will output a log message for every method name in the source file, regardless of where they are. The fact that this recurses over the entire tree structure means that these methods can be in top-level classes, inner classes, or even anonymous classes within other methods.

The alternative is GenericVisitor<R, A>. This works similarly to VoidVisitor, except that its visit() methods have a return value. We also have adapter classes here, depending on how we want to collect the return values from each method. For example, GenericListVisitorAdaptor will force us to have our return type from each method be a List<R> instead and merge all of these lists together:

List<String> allMethods = compilationUnit.accept(new GenericListVisitorAdapter<String, Object>() {
    @Override
    public List<String> visit(MethodDeclaration n, Object arg) {
        List<String> result = super.visit(n, arg);
        result.add(n.getName().asString());
        return result;
    }
}, null);

This will return a list that contains the names of every method in the entire tree.

6. Outputting Parsed Code

In addition to parsing and analyzing our code, we can also output it again as a string. This can be useful for many reasons – for example, if we want to extract and output only specific sections of the code.

The easiest way to achieve this is to simply use the standard toString() method. All of our AST types correctly implement this and will produce formatted code. Note that this might not be formatted exactly as it was when we parsed the code, but it’ll still follow relatively standard conventions.

For example, if we parse the following code:

package com.baeldung.javaparser;
import java.util.List;
class TestClass {
private List<String> doSomething()  {}
private class Inner {
private String other() {}
}
}

When we format it, we’ll get this as the output:

package com.baeldung.javaparser;
import java.util.List;
class TestClass {
    private List<String> doSomething() {
    }
    private class Inner {
        private String other() {
        }
    }
}

The other method we can use for formatting code is using the DefaultPrettyPrinterVisitor. This is a standard visitor class that will handle formatting. This gives us the advantage of configuring some aspects of how the output is formatted. For example, if we wanted to indent with two spaces instead of four, we could write:

DefaultPrinterConfiguration printerConfiguration = new DefaultPrinterConfiguration();
printerConfiguration.addOption(new DefaultConfigurationOption(DefaultPrinterConfiguration.ConfigOption.INDENTATION,
    new Indentation(Indentation.IndentType.SPACES, 2)));
DefaultPrettyPrinterVisitor visitor = new DefaultPrettyPrinterVisitor(printerConfiguration);
compilationUnit.accept(visitor, null);
String formatted = visitor.toString();

7. Manipulating Parsed Code

Once we’ve parsed some code into an AST, we’re also able to make changes to it. Since this is now just a Java object model, we can treat it as any other object model, and JavaParser gives us the ability to change most aspects of it freely.

Combining this with the ability to output our AST back as working source code means that we can then manipulate parsed code, make changes to it, and provide the output in some form. This could be useful for IDE plugins, code compilation steps, and much more.

This can be used in any way that we have access to the appropriate AST elements – whether that’s from directly accessing them, iterating with a visitor, or whatever makes sense.

For example, if we wanted to uppercase every single method name in a piece of code, then we could do something like:

compilationUnit.accept(new VoidVisitorAdapter<Object>() {
    @Override
    public void visit(MethodDeclaration n, Object arg) {
        super.visit(n, arg);
        
        String oldName = n.getName().asString();
        n.setName(oldName.toUpperCase());
    }
}, null);

This uses a simple visitor to visit every method declaration in our source tree and uses the setName() method to give each method a new name. The new name is then simply the old name in uppercase.

Once this is done, the AST is updated in place. We can then format it however we wish, and the newly formatted code will reflect our changes.

8. Summary

We’ve seen here a quick introduction to JavaParser. We’ve shown how to get started with it and some of the things we can achieve using it. Next time you need to manipulate some Java code, why not try it out?

All of the examples are available over on GitHub.

       

Viewing all articles
Browse latest Browse all 3550

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>