Turn Unstructured Data Into Structured Data With Camel Quarkus and Quarkus LangChain4j

1. Overview

Data extraction is a common challenge when working with unstructured content. We can use a Large Language Model to address this challenge.

In this article, we’ll learn how to build integration pipelines using Apache Camel. We’ll integrate HTTP endpoints with the LLM using LangChain4j and use Quarkus as the framework to run all our components together.

We’ll also review how to create integration routes that use an LLM as one of the components to structure the data.

2. Introduction to the Components

Let’s review each component that will help us handle the integration pipeline.

2.1. Quarkus

Quarkus is a Kubernetes-native Java framework optimized for building and deploying cloud-native applications. We can use it to develop high-performance, lightweight applications that start quickly and consume minimal memory. We’ll use Quarkus as the framework to run our integration application.

2.2. LangChain4j

LangChain4j is a Java library designed to work with large language models in applications. We’ll use it to send prompts to the LLM to structure the content. Additionally, LangChain4j has a great integration with Quarkus.

2.3. OpenAI

OpenAI is an AI research and development company focused on creating and advancing artificial intelligence technology. We can use OpenAI’s models, like GPT, to perform tasks such as language generation, data analysis, and conversational AI. We’ll use it to extract the data from unstructured content.

2.4. Apache Camel

Apache Camel is an integration framework that simplifies connecting different systems and applications. We can use it to build complex workflows by defining routes to move and transform data across various endpoints.

3. Integration of HTTP Source With Synchronous Response

Let’s build an integration application that will handle HTTP calls with unstructured content, extract data, and return a structured response.

3.1. Dependencies

We’ll start by adding the dependencies. We add the jsonpath dependency that’ll help us to extract JSON content in our integration pipeline:

<dependency>
    <groupId>org.apache.camel.quarkus</groupId>
    <artifactId>camel-quarkus-jsonpath</artifactId>
    <version>${camel-quarkus.version}</version>
</dependency>

Next, we add the camel-quarkus-langchain4j dependency to support LangChain4j handlers in our routes:

<dependency>
    <groupId>org.apache.camel.quarkus</groupId>
    <artifactId>camel-quarkus-langchain4j</artifactId>
    <version>${quarkus-camel-langchain4j.version}</version>
</dependency>

Finally, we add the camel-quarkus-platform-http dependency to support the HTTP endpoint as a data input for our routes:

<dependency>
    <groupId>org.apache.camel.quarkus</groupId>
    <artifactId>camel-quarkus-platform-http</artifactId>
    <version>${camel-quarkus.version}</version>
</dependency>

3.2. Structurizing Service

Now, let’s create a StructurizingService where we’ll add the prompting logic:

@RegisterAiService
@ApplicationScoped
public interface StructurizingService {
    String EXTRACT_PROMPT = """
      Extract information about a patient from the text delimited by triple backticks: ```{text}```.
      The customerBirthday field should be formatted as {dateFormat}.
      The summary field should concisely relate the patient visit reason.
      The expected fields are: patientName, patientBirthday, visitReason, allergies, medications.
      Return only a data structure without format name.
      """;
    @UserMessage(EXTRACT_PROMPT)
    @Handler
    String structurize(@JsonPath("$.content") String text, @Header("expectedDateFormat") String dateFormat);
}

We’ve added the structurize() method for building the chat model request. We’re using the EXTRACT_PROMPT text as a template for our prompt. We’ll extract the unstructured text from the input parameter and add it to the chat message. Additionally, we’ll take a date format from the second method parameter. We marked the method as an Apache Camel Route @Handler so we’ll be able to use it in our route builders without specifying the method name.

3.3. Route Builder

We use routes to specify our integration pipelines. We can create the route using the XML configuration or Java DSL with RouteBuilder.

Let’s use RouteBuilder to configure our pipeline:

@ApplicationScoped
public class Routes extends RouteBuilder {
    @Inject
    StructurizingService structurizingService;
    @Override
    public void configure() {
        from("platform-http:/structurize?produces=application/json")
          .log("A document has been received by the camel-quarkus-http extension: ${body}")
          .setHeader("expectedDateFormat", constant("YYYY-MM-DD"))
          .bean(structurizingService)
          .transform()
          .body();
    }
}

In our route configuration, we added the HTTP endpoint as a data source. We created a preconfigured header with a date format and attached the StructurizingService bean to handle requests, transforming the output body into the route response.

3.4. Testing the Route

Now, let’s call our new endpoint and check how it handles unstructured data:

@QuarkusTest
class CamelStructurizeAPIResourceLiveTest {
    Logger logger = LoggerFactory.getLogger(CamelStructurizeAPIResourceLiveTest.class);
    String questionnaireResponses = """
      Operator: Could you provide your name?
      Patient: Hello, My name is Sara Connor.
      //The rest of the conversation...           
      """;
    @Test
    void givenHttpRouteWithStructurizingService_whenSendUnstructuredDialog_thenExpectedStructuredDataIsPresent() throws JsonProcessingException {
        ObjectWriter writer = new ObjectMapper().writer();
        String requestBody = writer.writeValueAsString(Map.of("content", questionnaireResponses));
        Response response = RestAssured.given()
          .when()
          .contentType(ContentType.JSON)
          .body(requestBody)
          .post("/structurize");
        logger.info(response.prettyPrint());
        response
          .then()
          .statusCode(200)
          .body("patientName", containsString("Sara Connor"))
          .body("patientBirthday", containsString("1986-07-10"))
          .body("visitReason", containsString("Declaring an accident on main vehicle"));
   }
}

We’ve called the structurize endpoint. Then, we sent a conversation between a patient and a healthcare service operator. In the response, we’ve obtained the structured data and verified if we have information about the patient in the expected fields.

Additionally, we’ve logged the entire response, so let’s take a look at the output:

{
    "patientName": "Sara Connor",
    "patientBirthday": "1986-07-10",
    "visitReason": "Declaring an accident on main vehicle",
    "allergies": "Allergic to penicillin; mild reactions to certain over-the-counter antihistamines",
    "medications": "Lisinopril 10 mg, multivitamin, Vitamin D occasionally"
}

As we can see, all the content was structured and returned in a JSON format.

4. Conclusion

In this article, we discussed how to structure content using Quarkus, Apache Camel, and LangChain4j. With Apache Camel, we gain access to a wide range of data sources, allowing us to create transformation pipelines for our content. Using LangChain4j, we can implement data structuring processes and integrate them into our pipeline.

As always, the code is available over on GitHub.

Turn Unstructured Data Into Structured Data With Camel Quarkus and Quarkus LangChain4j

1. Overview

2. Introduction to the Components

2.1. Quarkus

2.2. LangChain4j

2.3. OpenAI

2.4. Apache Camel

3. Integration of HTTP Source With Synchronous Response

3.1. Dependencies

3.2. Structurizing Service

3.3. Route Builder

3.4. Testing the Route

4. Conclusion

Trending Articles

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Hizia picha za utupu za meneja wa benki imekaaje?

VMOU RSCIT Result 2017, RSCIT Result VMOU rkcl.vmou.ac.in Name Wise

JBL SUB-10 SUB-WOOFER - SCHEMATIC [CIRCUIT DIAGRAM] - AMP-SUBWOOFER

Forum Post: RE: Convert "xxxxxb#b" to "xxxxx"

Charlie Kirk

The 6 Best Sex Scenes in Nollywood Movies

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

BQ FirePro, LLC

Chapter 3 Mindful Eating: A Path to a Healthy Body Extra Questions and...

[RECOVERY][3.5.1-10-0][billie]Unofficial TWRP for OnePlus Nord N10 (Testing)

Visual Studio のプロダクトキーの変更について

Transformation of Sentence for HSC Students

Eminem – 2013 – The Marshall Mathers LP 2 (2xLP 180g Vinyl) [Vinyl 24bit 96kHz]

Top 14 Most Sexually Charged Excerpts From Erotica Books

Lady Gaga – MAYHEM (Bonus Tracks Version) [iTunes Rip M4A]

Download: Espe_Ichilimba ”Prod By: Shenky”

TEAM R2R Network Block Runtime v1.0.0 READ NFO-R2R

Albert Anderson Arrested by Miami-Dade County Corrections on May 19, 2020

Mother of Hayle's Wesley McArthur calls for action after his methadone overdose