1. Overview
Sometimes, we may come across a scenario that requires us to convert an XML (Extensible Markup Language) string to an actual XML document that we can process. It’s more common in tasks like web crawling or retrieving XML data stored in a database.
In this article, we’ll discuss how to convert an XML string to an XML document. We’ll cover two approaches to the problem.
2. Example String
We’ll use a simple XML document that contains data about the blog post(s):
<posts>
<post postId="1">
<title>Parsing XML as a String in Java</title>
<author>John Doe</author>
</post>
</posts>
posts is the root that contains post as a child(ren).
3. Parsing XML from a String
In this section, we’ll cover two methods to parse XML from our example string.
3.1. InputSource with StringReader
When we parse an XML document, we build an instance of the DocumentBuilder class. Then, we invoke the parse method on the instance, which expects an input source to parse the XML from.
In our case, the XML is a string. Therefore, if we pass the string directly to the parse method, it will throw an exception. That’s because the parse method expects the string to be a URI (Uniform Resource Identifier) that points to the XML resource.
Fortunately, we can create a custom input source for our XML string. We can use the StringReader class to create a stream of characters from the XML string. Then, we create a new input source out of that stream:
String xmlString = "<posts>...</posts>";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputSource inputSource = new InputSource(new StringReader(xmlString));
Document document = builder.parse(inputSource);
Let’s break this down:
- xmlString is our example XML string
- DocumentBuilder is a helper class that lets us parse and create XML documents
- InputSource is a wrapper that can consume XML data from various sources such as files, strings, and streams
- StringReader turns the XML string into a readable stream of characters
Essentially, the InputSource can take a Reader object to parse it as an XML:
public InputSource(Reader characterStream)
Similarly, the parse method takes an InputSource:
public Document parse(InputSource is)
Eventually, the parse method creates an XML document out of the string. We can test it out by passing it our example document as a string:
@Test
public void givenXmlString_whenConvertToDocument_thenSuccess() {
...
assertNotNull(document);
assertEquals("posts", document.getDocumentElement().getNodeName());
Element rootElement = document.getDocumentElement();
var childElements = rootElement.getElementsByTagName("post");
assertNotNull(childElements);
assertEquals(1, childElements.getLength());
}
We can expect the test to pass if the XML string is a valid markup.
3.2. InputStream of Byte Array
Using InputStream is another common approach to parsing XML. It’s useful when we need to parse XML from a stream such as a network stream or a file stream. In addition, it’s simpler to use than InputSource, which lets us fine-tune the parsing process.
First, we create an instance of ByteArrayInputStream from our XML string. Then, we feed it to the parse method:
InputStream inputStream = new ByteArrayInputStream(xmlString.getBytes(StandardCharsets.UTF_8));
Document document = builder.parse(inputStream);
In the code, we convert the string to an array of bytes. In addition, we also specify the character encoding, which is UTF-8 in this case.