Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 3550

Convert JSON to Avro Object

$
0
0

1. Introduction

In this tutorial, we’ll explore how to convert JSON data to Apache Avro objects in Java. Avro is a data serialization framework that provides rich data structures and binary data in a compact format. In addition, unlike other serialization frameworks, Avro uses schemas defined in JSON format, instead of requiring code generation for serialization.

As a consequence, one of its key strengths is support for schema evolution. This way, Avro is particularly suitable for applications that need to handle data structures that change over time. Furthermore, thanks to its compact data format it’s useful for applications processing high volumes of data.

2. JSON to Avro Conversion

In Avro, converting from JSON to objects requires a schema that establishes the data structure and a conversion mechanism. In our case, this conversion mechanism will be in the convertJsonToAvro() method.

The schema defines the format of the data (including field names and types), while the method uses this schema to transform the JSON into Avro objects.

2.1. Implementing the Conversion Method

First, let’s add the necessary dependency to our pom.xml:

<dependency>
    <groupId>org.apache.avro</groupId>
    <artifactId>avro</artifactId>
    <version>1.12.0</version>
</dependency>

Next, let’s create a schema that defines the structure our JSON should follow:

private static final String SCHEMA_JSON = """
    {
        "type": "record",
        "name": "Customer",
        "namespace": "com.baeldung.avro",
        "fields": [
            {"name": "name", "type": "string"},
            {"name": "age", "type": "int"},
            {"name": "email", "type": ["null", "string"], "default": null}
        ]
    }""";

Next, let’s create a converter method that handles the JSON to Avro transformation. Furthermore, the conversion process involves three main components: the schema, a decoder that reads JSON data according to the schema, and a DatumReader that creates the Avro objects.

Let’s create the method:

GenericRecord convertJsonToAvro(String json) throws IOException {
    
    try {
        DatumReader<GenericRecord> reader = new GenericDatumReader<>(schema);
        Decoder decoder = DecoderFactory.get().jsonDecoder(schema, json);
        return reader.read(null, decoder);
    } catch (IOException e) {
        throw new IOException("Error converting JSON to Avro", e);
    }
}

Using the decoder we’ve instantiated we read the JSON input and make sure this matches our schema structure. Finally, with the DatumReader we’re using both the schema and decoder to create GenericRecord objects. This way, Avro represents data without needing generated classes.

When converting JSON to Avro, we normally follow these steps:

  • The JSON input is validated against the schema
  • The decoder parses the JSON according to the schema’s structure
  • The DatumReader creates a GenericRecord containing the data
  • Any fields not present in the JSON but defined in the schema are assigned their default values

One important aspect to note is how Avro handles union types. When we define a field as a union (for example: [“null”, “string”]), the JSON representation must explicitly specify which type is being used. For example, we must wrap a string value in a JSON object with the type as the key: {“string”: “value”}. As such, this differs from regular JSON where we’re just using the value directly.

2.2. Testing the JSON to Avro Conversion

Now, let’s test our implementation:

@Test
void whenValidJsonInput_thenConvertsToAvro() throws IOException {
    
    JsonToAvroConverter converter = new JsonToAvroConverter();
    String json = "{\"name\":\"John Doe\",\"age\":30,\"email\":{\"string\":\"john@example.com\"}}";
    GenericRecord record = converter.convertJsonToAvro(json);
    assertEquals("John Doe", record.get("name").toString());
    assertEquals(30, record.get("age"));
    assertEquals("john@example.com", record.get("email").toString());
}

This test verifies that our converter handles correctly a complete JSON object (every field is populated). Furthermore, let’s note the special format for the email field. This uses Avro’s union-type syntax.

As we can see from the test, all types are correctly converted and accessible in the GenericRecord result variable.

Let’s take a look at our next test, where we transform a JSON with a null field into a GenericRecord:

@Test
void whenJsonWithNullableField_thenConvertsToAvro() throws IOException {
    
    JsonToAvroConverter converter = new JsonToAvroConverter();
    String json = "{\"name\":\"John Doe\",\"age\":30,\"email\":null}";
    GenericRecord record = converter.convertJsonToAvro(json);
    assertEquals("John Doe", record.get("name").toString());
    assertEquals(30, record.get("age"));
    assertNull(record.get("email"));
}

This test confirms that we’re converting properly the null value in the optional email field. We’ve defined the email field as a union of [“null”, “string”] in our schema. Therefore, it accepts null values.

3. Advanced Usage

With the basic conversion from JSON to Avro object we’re covering many usual cases. However, real-world applications often require more complex operations.

Let’s look at two scenarios: processing JSON arrays (essential for data set handling) and binary serialization. The latter prepares the data for storage or communication.

3.1. Processing JSON Arrays

Sometimes, we’ll need to process multiple JSON objects at once. Considering this, let’s extend our converter to handle JSON arrays. For this, let’s create a new method:

List<GenericRecord> convertJsonArrayToAvro(String jsonArray) throws IOException {
    
    List<GenericRecord> records = new ArrayList<>();
    
    Schema arraySchema = Schema.createArray(schema);
    
    Decoder decoder = DecoderFactory.get().jsonDecoder(arraySchema, jsonArray);
    DatumReader<List<GenericRecord>> reader = new GenericDatumReader<>(arraySchema);
    
    List<GenericRecord> result = reader.read(null, decoder);
    return result;
}

Now, let’s analyze our method. First, we’re creating a schema for an array of our existing record schema. Next, we’re using Avro’s built-in JSON decoder to verify that our JSON (“arraySchema) respects the structure defined in the schema, converts each field to its schema equivalent, and then, handle accordingly special cases such as the union types.

Finally, we’re using the DatumReader and we’re reading the entire array at once.

3.2. Testing the Processing of JSON Arrays

Now, let’s create a test to verify this method:

@Test
void whenJsonArray_thenConvertsToAvroList() throws IOException {
    
    JsonToAvroConverter converter = new JsonToAvroConverter();
    String jsonArray = """
        [
            {"name":"John Doe","age":30,"email":{"string":"john@example.com"}},
            {"name":"Jane Doe","age":28,"email":{"string":"jane@example.com"}}
        ]""";
    List<GenericRecord> records = converter.convertJsonArrayToAvro(jsonArray);
    assertEquals(2, records.size());
    assertEquals("John Doe", records.get(0).get("name").toString());
    assertEquals("jane@example.com", records.get(1).get("email").toString());
}

Let’s briefly analyze the test. One thing worth mentioning is that for fields defined as unions (like our email field), we need to maintain the proper Avro JSON format, even within arrays.

3.3. Binary Serialization

While JSON to Avro objects is useful for data processing, most applications need storage or transfer of this data in an efficient format. As such, Avro’s binary serialization offers significant advantages over JSON or XML.

Some of these advantages are a more compact format, better serialization/deserialization performance, and built-in support for schema evolution.  Now, let’s write a method that helps us with this. Our serializeAvroRecord() method demonstrates how to convert a GenericRecord into its binary equivalent, ready for storage or transfer:

byte[] serializeAvroRecord(GenericRecord record) throws IOException {
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    DatumWriter<GenericRecord> writer = new GenericDatumWriter<>(schema);
    BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
    
    writer.write(record, encoder);
    encoder.flush();
    return outputStream.toByteArray();
}

3.4. Testing the Binary Serialization

Now, let’s create a test to verify this method:

@Test
void whenSerializingAvroRecord_thenProducesByteArray() throws IOException {
    String json = """ 
        {"name":"John Doe","age":30,"email":{"string":"john@example.com"}}
        """ ;
    JsonToAvroConverter converter = new JsonToAvroConverter();
    GenericRecord record = converter.convertJsonToAvro(json);
    byte[] bytes = converter.serializeAvroRecord(record);
    assertNotNull(bytes);
    assertTrue(bytes.length > 0);
}

Let’s briefly analyze the test. First, we convert a JSON into a GenericRecord. Next, we serialize this record in binary format using our method, serializeAvroRecord(). Finally, we test that our method produces an array of bytes non-null and non-empty.

4. Conclusion

In this article, we’ve explored how to convert JSON data to Avro objects in Java. We’ve discussed basic conversion, handling arrays, serialization, and validation.

We’ve also talked about the importance of using Avro’s binary serialization abilities in favor of other options. Our solution provides a robust foundation for working with JSON and Avro in Java applications.

As always, the code is available over on GitHub.

       

Viewing all articles
Browse latest Browse all 3550

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>