
1. Introduction
In this tutorial, we’ll discuss different options for generating Avro schemas from existing Java classes. Although not the standard workflow, this direction of transformation might happen as well and it’s good to know in the simplest possible way with already existing libraries.
2. What’s Avro?
Before we delve into the nuances of transforming existing classes back into schemas, let’s review what Avro is.
According to the documentation, it’s a data serialization system capable of serialization and deserialization of data following the predefined schema, which is the core of this system. The schema itself is expressed in JSON format. More about Avro can be found in the already-published guide.
3. Motivation to Generate Avro Schema From Existing Java Classes
The standard workflow when working with Avro consists of defining the schema followed by generating classes in the chosen language. Even though it’s the most popular way, it’s also possible to go backward and generate the Avro schema from classes present in the project.
Let’s imagine a scenario where we’re working with a legacy system and want to emit data over a message broker, and we decided to use Avro as a (de)serialization solution. When getting through the code, we found that we can quickly become compliant with new rules by emitting data expressed by existing classes.
It would be tedious to translate the Java code to Avro JSON schemas manually. Instead, we can use available libraries to do that for us and save time.
4. Generating Avro Schema Using Avro Reflection API
The first option allowing us to transform the existing Java class to Avro schema quickly is to use the Avro Reflection API. To use this API, we need to make sure that our project depends on the Avro library:
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>1.12.0</version>
</dependency>
4.1. Simple Records
Let’s assume we want to use the ReflectData API for a simple Java record:
record SimpleBankAccount(String bankAccountNumber) {
}
We can use ReflectData‘s singleton instance to generate an org.apache.avro.Schema object for any given Java class. Then, we can call the toString() method of the Schema instance to get the Avro schema as a JSON String.
For validating the generated string against our expectation, we can use JsonUnit:
@Test
void whenConvertingSimpleRecord_thenAvroSchemaIsCorrect() {
Schema schema = ReflectData.get()
.getSchema(SimpleBankAccount.class);
String jsonSchema = schema.toString();
assertThatJson(jsonSchema).isEqualTo("""
{
"type" : "record",
"name" : "SimpleBankAccount",
"namespace" : "com.baeldung.apache.avro.model",
"fields" : [ {
"name" : "bankAccountNumber",
"type" : "string"
} ]
}
""");
}
Even though we used a Java record for simplicity, this will work equally well with a plain Java object.
4.2. Nullable Fields
Let’s add another String field to our Java record. We can mark it optional using the @org.apache.avro.reflect.Nullable annotation:
record BankAccountWithNullableField(
String bankAccountNumber,
@Nullable String reference
) {
}
If we repeat the test, we can expect reference‘s nullability to be reflected:
@Test
void whenConvertingRecordWithNullableField_thenAvroSchemaIsCorrect() {
Schema schema = ReflectData.get()
.getSchema(BankAccountWithNullableField.class);
String jsonSchema = schema.toString(true);
assertThatJson(jsonSchema).isEqualTo("""
{
"type" : "record",
"name" : "BankAccountWithNullableField",
"namespace" : "com.baeldung.apache.avro.model",
"fields" : [ {
"name" : "bankAccountNumber",
"type" : "string"
}, {
"name" : "reference",
"type" : [ "null", "string" ],
"default" : null
} ]
}
""");
}
As we can see, applying the @Nullable annotation on the new field made the reference field in the generated schema union null.
4.3. Ignored Fields
The Avro library also gives us the option to ignore certain fields when generating schemas. For example, we don’t want to transmit sensitive information over the wire. To achieve this, it’s enough to use the @AvroIgnore annotation on the particular field:
record BankAccountWithIgnoredField(
String bankAccountNumber,
@AvroIgnore String reference
) {
}
Consequently, the generated schema will match the one from our first example.
4.4. Overriding Field Names
By default. fields in generated schemas are created with names coming directly from Java field names. Although this is the default behavior, it can be tweaked:
record BankAccountWithOverriddenField(
String bankAccountNumber,
@AvroName("bankAccountReference") String reference
) {
}
The schema generated from this version of our record uses bankAccountReference instead of reference:
{
"type" : "record",
"name" : "BankAccountWithOverriddenField",
"namespace" : "com.baeldung.apache.avro.model",
"fields" : [ {
"name" : "bankAccountNumber",
"type" : "string"
}, {
"name" : "bankAccountReference",
"type" : "string"
} ]
}
4.5. Fields with Multiple Implementations
Sometimes, our class might contain a field whose type is a subtype.
Let’s assume AccountReference is an interface with two implementations — we can stick to Java records for brevity:
interface AccountReference {
String reference();
}
record PersonalBankAccountReference(
String reference,
String holderName
) implements AccountReference {
}
record BusinessBankAccountReference(
String reference,
String businessEntityId
) implements AccountReference {
}
In our BankAccountWithAbstractField, we indicate the supported implementations of the AccountReference field using the @org.apache.avro.reflect.Union annotation:
record BankAccountWithAbstractField(
String bankAccountNumber,
@Union({ PersonalBankAccountReference.class, BusinessBankAccountReference.class })
AccountReference reference
) {
}
As a result, the generated Avro schema will contain a union allowing the assignment of either of these two classes, rather than limiting us to just one:
{
"type" : "record",
"name" : "BankAccountWithAbstractField",
"namespace" : "com.baeldung.apache.avro.model",
"fields" : [ {
"name" : "bankAccountNumber",
"type" : "string"
}, {
"name" : "reference",
"type" : [ {
"type" : "record",
"name" : "PersonalBankAccountReference",
"namespace" : "com.baeldung.apache.avro.model.BankAccountWithAbstractField",
"fields" : [ {
"name" : "holderName",
"type" : "string"
}, {
"name" : "reference",
"type" : "string"
} ]
}, {
"type" : "record",
"name" : "BusinessBankAccountReference",
"namespace" : "com.baeldung.apache.avro.model.BankAccountWithAbstractField",
"fields" : [ {
"name" : "businessEntityId",
"type" : "string"
}, {
"name" : "reference",
"type" : "string"
} ]
} ]
} ]
}
4.6. Logical Types
Avro supports logical types. These are primitive types on the schema level but contain additional hints for the code generator telling what class should be used to represent the particular field.
For example, we can leverage the logical types feature if our model uses temporal fields or UUIDs:
record BankAccountWithLogicalTypes(
String bankAccountNumber,
UUID reference,
LocalDateTime expiryDate
) {
}
Additionally, we’ll configure our ReflectData instance, adding the Conversion objects we need. We can create our own Conversions or use the ones coming out of the box:
@Test
void whenConvertingRecordWithLogicalTypes_thenAvroSchemaIsCorrect() {
ReflectData reflectData = ReflectData.get();
reflectData.addLogicalTypeConversion(new Conversions.UUIDConversion());
reflectData.addLogicalTypeConversion(new TimeConversions.LocalTimestampMillisConversion());
String jsonSchema = reflectData.getSchema(BankAccountWithLogicalTypes.class).toString();
// verify schema
}
Consequently, when we generate and validate the schema, we’ll notice that the new fields will include a logicalType field:
{
"type" : "record",
"name" : "BankAccountWithLogicalTypes",
"namespace" : "com.baeldung.apache.avro.model",
"fields" : [ {
"name" : "bankAccountNumber",
"type" : "string"
}, {
"name" : "expiryDate",
"type" : {
"type" : "long",
"logicalType" : "local-timestamp-millis"
}
}, {
"name" : "reference",
"type" : {
"type" : "string",
"logicalType" : "uuid"
}
} ]
}
5. Generating Avro Schema Using Jackson
Although the Avro Reflection API is useful and should be able to address different, even complex needs, it’s always worth knowing alternatives.
In our case, the alternative for the library we just experimented with is the Jackson Dataformats Binary library, specifically its Avro-related submodule.
First, let’s add the jackson-core and jackson-dataformat-avro dependencies to our pom.xml:
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.17.2</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-avro</artifactId>
<version>2.17.2</version>
</dependency>
5.1. Simple Conversions
Let’s start exploring what Jackson has to offer by writing a simple converter. This implementation has the advantage of using well-known Java APIs. In fact, Jackson is one of the most widely used libraries, while Avro APIs used directly are rather niche.
We’ll create AvroMapper and AvroSchemaGenerator instances and use them to retrieve an org.apache.avro.Schema instance.
From there, we simply call the toString() method, like in the previous examples:
@Test
void whenConvertingRecord_thenAvroSchemaIsCorrect() throws JsonMappingException {
AvroMapper avroMapper = new AvroMapper();
AvroSchemaGenerator avroSchemaGenerator = new AvroSchemaGenerator();
avroMapper.acceptJsonFormatVisitor(SimpleBankAccount.class, avroSchemaGenerator);
Schema schema = avroSchemaGenerator.getGeneratedSchema().getAvroSchema();
String jsonSchema = schema.toString();
assertThatJson(jsonSchema).isEqualTo("""
{
"type" : "record",
"name" : "SimpleBankAccount",
"namespace" : "com.baeldung.apache.avro.model",
"fields" : [ {
"name" : "bankAccountNumber",
"type" : [ "null", "string" ]
} ]
}
""");
}
5.2. Jackson Annotations
If we compare the two schemas generated for SimpleBankAccount, we’ll notice a key difference: The schema generated with Jackson marked the bankAccountNumber field as nullable. This is because Jackson works differently than Avro Reflect.
Jackson doesn’t rely on reflection as much, and to be able to spot the fields to move to the schema, it requires the class to have accessors. Additionally, it’s also important to remember that the default behavior assumes the field is nullable. If we don’t want the field to be nullable in the schema, we need to annotate it with @JsonProperty(required = true).
Let’s create a different variation of the class and leverage this annotation:
record JacksonBankAccountWithRequiredField(
@JsonProperty(required = true) String bankAccountNumber
) {
}
Since all Jackson annotations applied to the original Java class are still enforced, we need to carefully check the results of the conversion.
5.3. Logical Types Aware Converter
Jackson, like Avro Reflection, doesn’t consider logical types by default. So, we need to explicitly enable this feature. Let’s do this by introducing small adjustments to the AvroMapper and AvroSchemaGenerator objects:
@Test
void whenConvertingRecordWithRequiredField_thenAvroSchemaIsCorrect() throws JsonMappingException {
AvroMapper avroMapper = AvroMapper.builder()
.addModule(new AvroJavaTimeModule())
.build();
AvroSchemaGenerator avroSchemaGenerator = new AvroSchemaGenerator()
.enableLogicalTypes();
avroMapper.acceptJsonFormatVisitor(BankAccountWithLogicalTypes.class, avroSchemaGenerator);
Schema schema = avroSchemaGenerator.getGeneratedSchema()
.getAvroSchema();
String jsonSchema = schema.toString();
// verify schema
}
With these modifications, we’ll be able to observe the logical types feature being used in generated Avro schemas for Temporal objects.
6. Conclusion
In this article, we’ve showcased different approaches that allow us to generate the Avro schema out of the existing Java class. It’s possible to use the standard Avro Reflection API, as well as Jackson with its Binary Avro module.
Although Avro’s way and its APIs are less known to a wide audience, it seems to be a more predictable solution than using Jackson, which might easily lead to mistakes if incorporated into the main project we’re working on.
Examples in this article are not exhaustive presentations of the possibilities provided by either Avro or Jackson. Please check the code on GitHub to see examples of less commonly used features or refer to the official documentation of one of these two libraries.
All the code presented in this article is available over on GitHub.
The post Generate Avro Schema From Certain Java Class first appeared on Baeldung.