To get started, add the following dependency:
<dependency>
<groupId>com.michelin</groupId>
<artifactId>avro-xml-mapper</artifactId>
<version>${avro-xml-mapper.version}</version>
</dependency>
The XPath attribute is used to specify the path of the element in the XML file.
A single element is represented as follows:
AVSC | XML |
---|---|
|
<objectRoot>
<element>content</element>
</objectRoot> |
Lists can be applied to any repeating element in the XML file. The XPath attribute should point to the repeating element.
AVSC | XML |
---|---|
|
<objectRoot>
<child>content1</child>
<child>content2</child>
</objectRoot> |
Complex types can also be defined as follows:
AVSC | XML |
---|---|
|
<objectRoot>
<recordList>
<listItem>
<subStringField>item1</subStringField>
<subIntField attribute="attribute1">1</subIntField>
</listItem>
<listItem>
<subStringField>item2</subStringField>
<subIntField attribute="attribute2">2</subIntField>
</listItem>
<listItem>
<subStringField>item3</subStringField>
<subIntField attribute="attribute3">3</subIntField>
</listItem>
</recordList>
</objectRoot> |
Maps have two accepted formats:
- A list of elements with a key attribute
AVSC | XML |
---|---|
|
<objectRoot>
<element key="key1">content1</element>
<element key="key2">content2</element>
</objectRoot> |
- A list of nodes with a key element and a value element
AVSC | XML |
---|---|
|
<objectRoot>
<element>
<key>key1</key>
<value>content1</value>
</element>
<element>
<key>key2</key>
<value>content2</value>
</element>
</objectRoot> |
In both cases, the rootXpath
attribute always points to the repeating element of the list.
Only the timestamp-millis
Long logical type is handled and has multiple accepted formats:
- ISO8601 date-time
- ISO8601 date
- Flat date (yyyyMMddz) which gets the UTC 12:00:00.000 time to avoid timezone issues
- Flat date-time (yyyyMMddHHmmssz) which gets the UTC timezone assigned
- ISO8601 date-time without offset
- ISO8601 date without offset
- Flat date without offset (yyyyMMdd) which gets the UTC 12:00:00.000 time to avoid timezone issues
- Flat date-time without offset (yyyyMMdd HHmmss) which gets the UTC timezone assigned
- Flat date-time without offset and without timezone (yyyy-MM-dd HH:mm:ss) which gets the UTC timezone assigned
- Flat date-time with offset (yyyy-MM-dd'T'HH:mm:ss'T'00:00)
They are all converted to the Instant
Java type.
Only the Decimal
byte logical type is handled. It is converted to a BigDecimal
Java type.
The xmlNamespaces
attribute defined at the root of the AVSC file is used to specify the namespaces used in the XML file.
It should be noted that this attribute is used in different ways depending on the conversion direction as described in the following sections.
The namespaces are used to unify the XML file.
If multiple namespace definitions refer to the same URI, only the one defined in the xmlNamespaces
attribute will be kept during conversion.
For instance, with the given AVSC and XML:
{
"name": "Object",
"type": "record",
"namespace": "com.example",
"xpath": "objectRoot",
"xmlNamespaces": {
"null": "http://namespace.uri/default",
"ns1": "http://namespace.uri/1"
},
"fields": [
{"name": "element", "type": "string", "xpath": "element"},
{"name": "secondElement", "type": "string", "xpath": "ns1:secondElement"},
{"name": "thirdElement", "type": "string", "xpath": "ns1:thirdElement"}
]
}
<objectRoot xmlns="http://namespace.uri/default"
xmlns:ns1="http://namespace.uri/1">
<element>content</element>
<ns1:secondElement>second element content</ns1:secondElement>
<ns2:thirdElement xmlns:ns2="http://namespace.uri/1">third element content</ns2:thirdElement>
</objectRoot>
Before conversion to Avro, the initial document is tweaked as such:
<noprefixns:objectRoot xmlns:noprefixns="http://namespace.uri/default"
xmlns:ns1="http://namespace.uri/1">
<noprefixns:element>content</noprefixns:element>
<ns1:secondElement>second element content</ns1:secondElement>
<ns1:thirdElement>third element content</ns1:thirdElement>
</noprefixns:objectRoot>
The root xmlns
namespace is replaced with xmlns:noprefixns
and the ns1
is simply preserved.
The ns2
namespace is removed because it refers to the same URI as the ns1
namespace.
Failing to provide
xmlNamespaces
for XML to Avro conversion simply means that namespaces in XPath have to be consistent.
The namespaces are used for root namespace definition.
Failing to provide
xmlNamespaces
for Avro to XML conversion means that no namespace should be used in the XPath attributes, as it would mean that the produced XML would be invalid.
The keepEmptyTag
attribute can be used to signify that the tag needs to be kept in the Avro to XML conversion in case the original Avro field is null:
AVSC | XML |
---|---|
|
<objectRoot>
<element />
</objectRoot> |
Using the provided method AvroToXmlMapper#convertAvroToXmlDocument
allows for custom implementations and editing of the document before it is converted to String.
Conversion can be finalized using GenericUtils#documentToString
method.
We welcome contributions from the community! Before you get started, please take a look at our contribution guide to learn about our guidelines and best practices. We appreciate your help in making Avro XML Mapper a better tool for everyone.