1. Introduction
When working with Apache Avro in Java applications, we often need to convert Plain Old Java Objects (POJOs) to their Avro equivalent. While it’s perfectly acceptable to do this manually by setting each field individually, performing this conversion using generics is a better and more maintainable approach.
In this article, we’ll explore how to convert POJOs into Avro objects. We’ll approach this in a robust way to changes made to the original Java class structure.
2. The Straightforward Approach
Let’s say we have an area of code with a POJO that we want to convert to an Avro object.
Let’s see our POJO:
public class Pojo {
private final Map<String, String> aMap;
private final long uid;
private final long localDateTime;
public Pojo() {
aMap = new HashMap<>();
uid = ThreadLocalRandom.current().nextLong();
localDateTime = LocalDateTime.now().atZone(ZoneId.systemDefault()).toInstant().toEpochMilli();
aMap.put("mapKey", "mapValue");
}
//getters
}
Then, we have the class that does the mapping with its specific method:
public static Record mapPojoToRecordStraightForward(Pojo pojo){
Schema schema = ReflectData.get().getSchema(pojo.getClass());
GenericData.Record avroRecord = new GenericData.Record(schema);
avroRecord.put("uid", pojo.getUid());
avroRecord.put("localDateTime", pojo.getLocalDateTime());
avroRecord.put("aMap", pojo.getaMap());
return avroRecord;
}
As we can see, the straightforward approach involves explicitly setting each field. Just by looking at this solution, we can see the problems that could appear in the future. This solution is brittle and requires updates whenever the POJO structure changes. It is not the best solution.
Note that we can pull the schema from sources other than the POJO itself; for example, we could have also looked it up by schema version.
3. Generic Conversion Using Reflection
Another approach is to use Java Reflection. This method uses reflection and iterates over every field in the POJO. Next, it sets each field in the Avro Record.
Here’s what this would look like:
public static Record mapPojoToRecordReflection(Pojo pojo) throws IllegalAccessException {
Class<?> pojoClass = pojo.getClass();
Schema schema = ReflectData.get().getSchema(pojoClass);
GenericData.Record avroRecord = new GenericData.Record(schema);
for (Field field : pojoClass.getDeclaredFields()) {
field.setAccessible(true);
avroRecord.put(field.getName(), field.get(pojo));
}
Afterwards, it goes through each superclass and sets those fields in the record:
// Handle superclass fields
Class<?> superClass = pojoClass.getSuperclass();
while (superClass != null && superClass != Object.class) {
for (Field field : superClass.getDeclaredFields()) {
field.setAccessible(true);
avroRecord.put(field.getName(), field.get(pojo));
}
superClass = superClass.getSuperclass();
}
return avroRecord;
}
Most importantly, this method is straightforward but is slower for large objects or if called frequently.
4. Using Avro’s ReflectDatumWriter Class
Avro has a built-in functionality for this scenario, the ReflectDatumWriter class. Initially, we generate an Avro schema from the POJO class. Next, we create a ReflectDatumWriter to serialize the POJO. Then, we set up a ByteArrayOutputStream and BinaryEncoder for writing:
public static GenericData.Record mapPojoToRecordReflectDatumWriter(Object pojo) throws IOException {
Schema schema = ReflectData.get().getSchema(pojo.getClass());
ReflectDatumWriter<Object> writer = new ReflectDatumWriter<>(schema);
ByteArrayOutputStream out = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(out, null);
Next, we serialize the POJO to binary format:
writer.write(pojo, encoder);
encoder.flush();
Finally, we create a BinaryDecoder to read the serialized data and use a GenericDatumReader to deserialize the binary data into a GenericData.Record:
BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(out.toByteArray(), null);
GenericDatumReader<GenericData.Record> reader = new GenericDatumReader<>(schema);
return reader.read(null, decoder);
}
This method uses Avro’s serialization and deserialization features to convert the POJO to an Avro Record. Note that this conversion version is more efficient for complex objects but introduces complexity for simple ones.
5. Conclusion
In this article, we’ve explored different ways to convert POJO’s in Avro records in Java. We’ve started with a straightforward approach, which although simple, has disadvantages when it comes to maintainability and flexibility. Next, we’ve analyzed a solution using Java reflection. This is more robust and is easier to adapt to changes in class structure. However, it has performance issues for larger objects or frequent calls.
Finally, we’ve come up with a solution that uses ReflectDatumWriter class of Avro. This class is suited for this specific purpose and is the most appropriate choice for our needs. Furthermore, this benefits from Avro’s internal optimizations and is recommended for complex scenarios.
To sum up, it’s important to evaluate the specific context of our needs. This way, we can choose the approach that best fits our criteria for performance, maintainability, and scalability.
As always, the code is available over on GitHub.