Refreshing Apache XML Infrastructure
At my company (Evolved Binary) we recently had to address a series of bugs in Elemental that involved the Serialization and Deserialization of XDM (XQuery and XPath Data Model) values. The issues occurred when transferring the values of certain XDM types over the REST and XML-RPC APIs.
The issues in the REST API were relatively easy for us to address as we control almost all of the code used there. However, Elemental inherited the XML:RPC API from eXist-db, and that relies heavily on the Apache XML-RPC library. The last official release of Apache XML-RPC was version 3.1.3 back in February of 2010 - some 15+ years ago! Since then it has not been maintained by the Apache project.
In fact, we had already visited the topic of what to do with Apache XML-RPC in the past...
In June 2018, we identified that there were known security issues in Apache XML-RPC. I noted that a number of linux distributions were shipping security patches to protect against those issues, but that those patches were not included in the official Apache XML-RPC release. At that time we decided to create our own fork of the Apache XML-RPC project because:
- The Apache XML-RPC project appeared to have been archived and was unmaintained.
- We had a direct dependency on it.
- It would not be easy to replace as we needed to maintain 100% forwards compatibility.
At that time I imported the Apache XML-RPC code from Subversion to our GitHub, fixed the issues, and cut our own release. We did this work publicly and in accordance with the original license. We chose a new major version number to try and signal that this was a major departure from 3.1.3, i.e. a new fork. We published a new release of Apache XML-RPC version 4.0.0 (in our own public namespace). This enabled anyone to use Apache XML-RPC with the security fixes already included.
In September 2022, I needed to switch a project over from Javax Servlet, to Jakarta Servlet. Unfortunately such a change in the Java world is not simple. Java Servlets (or those from any libraries you use) that utilise the Javax Servlet API have to be updated to compile against the newer Jakarta Servlet API. Apache XML-RPC was an example of one such library that delivers its Web Endpoint(s) using Javax Servlet. This time was easier as we already had own fork of Apache XML-RPC. From there we made the necessary changes, and chose a new major version number to indicate the breaking API change, and then we publicly released Apache XML-RPC version 5.0.0 that is compatible with Jakarta Servlet.
More recently, for Elemental, we wanted to be able to serialize and deserialize all XDM types over XML-RPC. If you squint a little, then the XDM Node types are a subset of the XML DOM (Document Object Model) types, and so we thought it might be nice if you could send and receive any XML over XML-RPC.
Now you might be forgiven for some confusion here if you are thinking: Wait a minute this is XML-RPC! Why can't he send XML over his XML-RPC? Let's step back for a moment. The XML in XML-RPC is just a wire-format for RPC, the focus is still on RPC (Remote Procedure Call), the XML is only used to describe function calls, their parameters, and their results.
For example, imagine we had a Java function like: public String sayHello(String name), then an XML-RPC call to that function would produce the following XML document request that is sent from the client to the server:
<methodCall>
<methodName>sayHello</methodName>
<params>
<param>
<value>Adam</value>
</param>
</params>
</methodCall>The server would then execute the sayHello function, and all being well, might send this XML response document back to the client:
<methodResponse>
<params>
<param>
<value>Hello Adam!</value>
</param>
</params>
</methodResponse>So far so good, but our use-case is more complicated as we want to be able to send any XML DOM type (e.g. org.w3c.dom.Document, org.w3c.dom.Element, org.w3c.dom.Text, etc.) as either function parameters or the function result type. Imagine that we had another Java function like: public Document createInvoice(Attr id, Element address, Element[] items). We now need to be able to put XML inside our XML-RPC request and/or response document. By default, XML-RPC does not support that. XML-RPC has a quite limited type system that supports just:
- 32-bit signed integers.
- 64-bit double-precision signed floating point numbers
- Boolean values.
- Strings.
- ISO 8601 date/time.
- Base64
- Arrays and Structures composed of the prior types.
Apache XML-RPC allows you to add extensions in the form of custom XML serializers/deserializers for your own types. However, as our goal here was the serialization/deserialization of general XML and not some custom type, and furthermore as we found some incomplete support for this in Apache XML-RPC, we decided it was best to extend that prior work to completion. After our additions, an XML-RPC call to the createInvoice function now produces the following XML document request that is sent from the client to the server:
<methodCall xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions" xmlns:dom="http://ws.apache.org/xmlrpc/namespaces/extensions/dom">
<methodName>createInvoice</methodName>
<params>
<param>
<value>
<ex:dom dom:type="2"><dom:attribute my-identifier="id1"/></ex:dom>
</value>
</param>
<param>
<value>
<ex:dom dom:type="1"><address><number>99</number><street>Via Medail</street><city>Bardonecchia</city><province>TO</province></address></ex:dom>
</value>
</param>
<param>
<value>
<array>
<data>
<ex:dom dom:type="1"><item><name>Sprockets</name><quantity>5</quantity><unit-cost currency="GBP">7.50</unit-cost></item></ex:dom>
<ex:dom dom:type="1"><item><name>Springets</name><quantity>12</quantity><unit-cost currency="EUR">19.20</unit-cost></item></ex:dom>
</data>
</array>
</value>
</param>
</params>
</methodCall>The server would then execute the createInvoice function, and all being well, might send the back this XML response document to the client:
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions" xmlns:dom="http://ws.apache.org/xmlrpc/namespaces/extensions/dom">
<params>
<param>
<value>
<ex:dom dom:type="9"><invoice> ... </invoice></ex:dom>
</value>
</param>
</params>
</methodResponse>We wrapped that nice new feature up, and to signal that it could cause backward incompatibilities with previous versions, we released it as a major version 6.0.0.
Unfortunately during testing of 6.0.0 without our system, we found occasional exceptions being thrown with particular XML documents that we wanted to use as parameters to our functions. This initially puzzled me, as these were completely valid XML documents in their own right, and should have been handled correctly by our changes. Here is an example of such a document, that caused an exception when used as a parameter (or function return type) within our Apache XML-RPC 6.0.0:
<c:Site xmlns="urn:content" xmlns:c="urn:content">
<config xmlns="urn:config">123</config>
<serverconfig xmlns="urn:config">123</serverconfig>
</c:Site>Trying to use such a document within XML-RPC raised an exception like:
java.lang.IllegalStateException: The prefix isn't the prefix, which has been defined last.
at org.apache.ws.commons.util.NamespaceContextImpl.endPrefixMapping(NamespaceContextImpl.java:95)
at org.apache.ws.commons.util.test.NamespaceContextIT$NamespaceContextHandler.endPrefixMapping(NamespaceContextIT.java:90)
...A quick look into org.apache.ws.commons.util.NamespaceContextImpl#endPrefixMapping(String) and we find this little nugget of Javadoc:
* @throws IllegalStateException The prefix is not the prefix, which
* has been defined last. In other words, the calls to
* {@link #startPrefixMapping(String, String)}, and
* {@link #endPrefixMapping(String)} aren't in LIFO order.This and a further study of the complete code in NamespaceContextImpl confirmed that it expects the interleaved calls to startPrefixMapping and endPrefixMapping to happen in LIFO (Last in, First out) order. There's one big problem with that though, which is that that is not how prefix mapping might work in the real world! Typically such start/end prefix mapping methods are fired by an XML Parser or Serializer, and if we look at SAX (Simple API for XML) which is an approach used in many XML parsers and serializers, we see that it explicitly states here:
Note that start/endPrefixMapping events are not guaranteed to be properly nested relative to each otherL-RPC further by working with XML Elements that can have complex namespace arrangements, we have found another bug. Unfortunately for us, this bug was not actually in Apache XML-RPC, but was in a library that it depends on - ws-commons-util (Apache Web Services Common Utilities). The Apache ws-commons-util library is even older than Apache XML-RPC, with its last official release version 1.0.2 being published back in August 2007 - some 18+ years ago! Since then it has not been maintained by the Apache project.
So we need to fix this bug too of course, but how should we do that?
With Apache XML-RPC we had felt comfortable forking the project as we had a direct dependency on it ourselves, but this is another level removed, where we have an indirect dependency on apache ws-commons-util through Apache XML-RPC. In addition, whilst Apache XML-RPC is quite application specific, ws-commons-util is known to be used as a general utility library in lots of other XML projects too. In the Java world, it is effectively XML infrastructure. Initially forking ws-commons-util felt like we were over reaching, and that any fix that we produced would be better upstreamed to Apache for all users to receive easily. Initially we contacted two of the main authors of ws-commons-util, and we received a very kind response from Jochen Wiedmann, who apologised that it was unmaintained and explained that whilst it may not be the response I hoped for, my best option was probably to fork it. Thank you very much Jochen :-)
Well with one of the original author's blessings in hand, and without any other option in sight, we forked the ws-commons-util project. I imported the Apache code from Subversion to our GitHub, fixed the bug by writing a completely new specification compliant implementation of NamespaceContextImpl, and cut our own release. We did this work publicly and in accordance with the original license. We bumped the feature version number to try and signal that this was an incremental release, i.e. a new implementation of NamespaceContextImpl that whilst maintaining the existing API, also adds a new and improved API. We published a new release of Apache ws-commons-util version 1.1.0 (in our own public namespace).
Finally, we then released a new version of Apache XML-RPC version 6.1.0, where we changed its dependency from the official Apache ws-commons-util 1.0.2 to our newer fork: ws-commons-util 1.1.0.
If you are users of Apache XML-RPC or ws-commons-util, we would love to hear from you if this has been helpful for you too.
It seems that at my company, Evolved Binary, we are creating a bit of a pattern for forking and maintaining XML infrastructure. I recognise that this is not without its own risks, and I hope to discuss those on this blog in detail soon.
In the meantime, for your perusal, Evolved Binary now maintain the following XML infrastructure projects:
- Builds of Apache Xerces, with and without XML Schema 1.1 support, and with and without Java 14+ support - https://central.sonatype.com/artifact/com.evolvedbinary.thirdparty.xerces/xercesImpl
- Apache XML APIs - https://central.sonatype.com/artifact/com.evolvedbinary.thirdparty.xml-apis/xml-apis
- Eclipse XPath 2 Engine - https://central.sonatype.com/artifact/com.evolvedbinary.thirdparty.org.eclipse.wst.xml/xpath2
- Milton WebDAV library - https://central.sonatype.com/namespace/org.exist-db.thirdparty.com.ettrema
- Builds of XML Mind DITAC - https://central.sonatype.com/artifact/com.evolvedbinary.thirdparty.com.xmlmind/ditac
- JVNet JAXB Maven Plugin - https://github.com/evolvedbinary/jvnet-jaxb-maven-plugin
- Mojohaus JAXB Maven Plugin - https://github.com/evolvedbinary/mojohaus-jaxb-maven-plugin