/smooks-edi-cartridge

Smooks EDI & EDIFACT cartridges for reading as well as writing EDI

Primary LanguageJavaOtherNOASSERTION

Smooks EDI Cartridge

Maven Central Sonatype Nexus (Snapshots) Build Status

The EDI cartridge provides a reader for parsing EDI documents, and a visitor for serialising the event stream into EDI.

In the following pass-through configuration, Smooks parses an EDI document and then serialises, or unparses in DFDL lingo, the generated event stream to produce an EDI document identical to the parsed document.

smooks-config.xml
<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:edi="https://www.smooks.org/xsd/smooks/edi-2.0.xsd">

    <!-- Configure the reader to parse the EDI stream into a stream of SAX events. -->
    <edi:parser schemaURI="/edi-to-xml-mapping.dfdl.xsd" // (1)
                segmentTerminator="%NL;" // (2)
                compositeDataElementSeparator="^"/> // (3)

    <!-- Configure the writer to serialise the XML stream into EDI. -->
    <edi:unparser schemaURI="/edi-to-xml-mapping.dfdl.xsd" // (1)
                  segmentTerminator="%NL;" // (2)
                  compositeDataElementSeparator="^" // (3)
                  unparseOnNode="/Order"/> // (4)

</smooks-resource-list>

Config attributes common to the parser and unparser resources are:

  1. schemaURI: the DFDL schema describing the structure of the EDI document to be parser or unparsed.

  2. segmentTerminator: the terminator for groups of related data elements. DFDL interprets %NL; as a newline.

  3. compositeDataElementSeparator: the delimiter for compound data elements.

The unparseOnNode attribute is exclusive to the unparser visitor. It tells the unparser which event to intercept and serialise. Consult with the EDI cartridge’s XSD documentation for the complete list of config attributes and elements.

EDI DFDL Schema

The user-defined DFDL schema supplied to the parser and unparser config elements drives the event mapping, whether it is EDI to SAX or SAX to EDI. This schema must import the bundled IBM_EDI_Format.dfdl.xsd DFDL schema which defines common EDI constructs like segments and data elements.

The following figure illustrates the mapping process:

Image:Edi-mapping.svg

  • input-message.edi is the input/output EDI document.

  • edi-to-xml-order-mapping.dfdl.xsd describes the EDI to SAX, or SAX to EDI, event mapping.

  • expected.xml is the result event stream from applying the mapping.

Segments

The next snippet shows a segment declaration in DFDL:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
            xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format" schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element dfdl:initiator="HDR" // (1)
                 name="header" // (2)
                 dfdl:ref="ibmEdiFmt:EDISegmentFormat"> // (3)
        <xsd:complexType>
            ...
        </xsd:complexType>
    </xsd:element>
</xsd:schema>
  1. dfdl:initiator identifies the segment code .

  2. name attribute specifies the segment’s XML mapping.

  3. ibmEdiFmt:EDISegmentFormat holds the segment structure definition; it is important to reference it from within the dfdl:ref attribute.

Segment Cardinality

What is not demonstrated in the previous section is the segment element’s optional attributes minOccurs and maxOccurs (default value of 1 in both cases). These attributes can be used to control the optional and required characteristics of a segment. An unbounded maxOccurs indicates that the segment can repeat any number of times in that location of the EDI document.

Segment Groups

You implicitly create segment groups when:

  1. Setting the "maxOccurs" in a segment element to more than one, and

  2. Adding within the segment element other segment elements

The HDR segment in the next example is a segment group because it is unbounded, and it encloses the CUS and ORD segments:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
            xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format" schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element dfdl:initiator="HDR" name="order" maxOccurs="unbounded">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:sequence dfdl:ref="ibmEdiFmt:EDISegmentFormat">
                    ...
                </xsd:sequence>
                <xsd:element dfdl:initiator="CUS" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="customer-details">
                    <xsd:complexType>
                        ...
                    </xsd:complexType>
                </xsd:element>
                <xsd:element dfdl:initiator="ORD" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="order-item"
                             maxOccurs="unbounded">
                    <xsd:complexType>
                        ...
                    </xsd:complexType>
                </xsd:element>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

Data Elements

Segment data elements are children within a sequence element referencing the DFDL format ibmEdiFmt:EDISegmentSequenceFormat:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
            xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format" schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element dfdl:initiator="HDR" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="header">
        <xsd:complexType>
            <xsd:sequence dfdl:ref="ibmEdiFmt:EDISegmentSequenceFormat">
                <xsd:element name="order-id" type="xsd:string"/>
                <xsd:element name="status-code" type="xsd:string"/>
                <xsd:element name="net-amount" type="xsd:string"/>
                <xsd:element name="total-amount" type="xsd:string"/>
                <xsd:element name="tax" type="xsd:string"/>
                <xsd:element name="date" type="xsd:string"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

Each child xsd:element within xsd:sequence represents an EDI data element. The name attribute is the name of the target XML element capturing the data element’s value.

Composite Data Elements

Data elements made up of components are yet another xsd:sequence referencing the DFDL format ibmEdiFmt:EDICompositeSequenceFormat:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
            xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format" schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element dfdl:initiator="CUS" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="customer-details">
        <xsd:complexType>
            <xsd:sequence dfdl:ref="ibmEdiFmt:EDISegmentSequenceFormat">
                <xsd:element name="username" type="xsd:string"/>
                <xsd:element name="name">
                    <xsd:complexType>
                        <xsd:sequence dfdl:ref="ibmEdiFmt:EDICompositeSequenceFormat">
                            <xsd:element name="firstname" type="xsd:string"/>
                            <xsd:element name="lastname" type="xsd:string"/>
                        </xsd:sequence>
                    </xsd:complexType>
                </xsd:element>
                <xsd:element name="state" type="xsd:string"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

Imports

Many EDI messages use the same segment definitions. Being able to define these segments once and import them into a top-level configuration saves on duplication. A simple configuration demonstrating the import feature would be as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
            xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format"
            xmlns:def="def">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format" schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>
    <xsd:import namespace="def" schemaLocation="example/edi-segment-definition.xml"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element name="Order">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:sequence dfdl:initiatedContent="yes">
                    <xsd:element dfdl:initiator="HDR" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="header" type="def:HDR"/>
                    <xsd:element dfdl:initiator="CUS" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="customer-details" type="def:CUS"/>
                    <xsd:element dfdl:initiator="ORD" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="order-item" maxOccurs="unbounded" type="def:ORD"/>
                </xsd:sequence>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

The above schema demonstrates the use of the import element, where just about anything can be moved into its own file for reuse.

Type Support

The type attribute on segment data elements allows datatype specification for validation. The following example shows type support in action:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/"
            xmlns:ibmEdiFmt="http://www.ibm.com/dfdl/EDI/Format">

    <xsd:import namespace="http://www.ibm.com/dfdl/EDI/Format" schemaLocation="/EDIFACT-Common/IBM_EDI_Format.dfdl.xsd"/>

    <xsd:annotation>
        <xsd:appinfo source="http://www.ogf.org/dfdl/">
            <dfdl:format ref="ibmEdiFmt:EDIFormat"/>
        </xsd:appinfo>
    </xsd:annotation>

    <xsd:element dfdl:initiator="HDR" dfdl:ref="ibmEdiFmt:EDISegmentFormat" name="header">
        <xsd:complexType>
            <xsd:sequence dfdl:ref="ibmEdiFmt:EDISegmentSequenceFormat">
                <xsd:element name="order-id" type="xsd:string"/>
                <xsd:element name="status-code" type="xsd:int" dfdl:textNumberPattern="0"/>
                <xsd:element name="net-amount" type="xsd:decimal" dfdl:textNumberPattern="0"/>
                <xsd:element name="total-amount" type="xsd:decimal" dfdl:textNumberPattern="#.#"/>
                <xsd:element name="tax" type="xsd:decimal" dfdl:textNumberPattern="#.#"/>
                <xsd:element name="date" type="xsd:date"/>
            </xsd:sequence>
        </xsd:complexType>
    </xsd:element>
</xsd:schema>

Maven Coordinates

pom.xml
<dependency>
    <groupId>org.smooks.cartridges.edi</groupId>
    <artifactId>smooks-edi-cartridge</artifactId>
    <version>2.0.0-RC1</version>
</dependency>

XML Namespaces

xmlns:edi="https://www.smooks.org/xsd/smooks/edi-2.0.xsd"

Smooks EDIFACT Cartridge

Maven Central Sonatype Nexus (Snapshots) Build Status

Smooks 2 provides out-of-the-box support for UN/EDIFACT interchanges in terms of pre-generated EDI DFDL schemas derived from the official UN/EDIFACT message definition zip directories. This allows you to easily convert a UN/EDIFACT message interchange into a consumable XML document. Specialised edifact:parser and edifact:unparser resources support UN/EDIFACT interchanges as shown in the next example:

smooks-config.xml
<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd">

    <edifact:parser schemaURI="/d03b/EDIFACT-Messages.dfdl.xsd"/>

    <edifact:unparser schemaURI="/d03b/EDIFACT-Messages.dfdl.xsd" unparseOnNode="/Interchange"/>

</smooks-resource-list>

The edifact:parser and edifact:unparser, analogous to the edi:parser and edi:unparser, convert the stream according to the pre-generated DFDL schema referenced in the schemaURI attribute. Given that an EDIFACT schema can be very big compared to your average EDI schema, it may take minutes for the parser to compile it. Even having the cacheOnDisk attribute enabled may not be sufficient to meet your compilation time needs. For such situations, you can mitigate this problem by declaring ahead of time which message types the parser will process:

smooks-config.xml
<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd">

    <edifact:parser schemaURI="/d03b/EDIFACT-Messages.dfdl.xsd">
        <edifact:messageTypes>
            <edifact:messageType>ORDERS</edifact:messageType>
            <edifact:messageType>INVOIC</edifact:messageType>
        </edifact:messageTypes>
    </edifact:parser>

    <edifact:unparser schemaURI="/d03b/EDIFACT-Messages.dfdl.xsd" unparseOnNode="/Interchange">
       <edifact:messageTypes>
            <edifact:messageType>ORDERS</edifact:messageType>
            <edifact:messageType>INVOIC</edifact:messageType>
        </edifact:messageTypes>
    </edifact:unparser>
</smooks-resource-list>

The schema compilation time is directly proportional to the number of declared message types. The EDIFACT resources will reject any message which does not have its message type declared within the messageTypes child element. Apart from XML configuration, it is also possible to programmatically control the EDIFACT parser message types via a EdifactReaderConfigurator instance:

Smooks smooks = new Smooks();
smooks.setReaderConfig(new EdifactReaderConfigurator("/d03b/EDIFACT-Messages.dfdl.xsd").setMessageTypes(Arrays.asList("ORDERS", "INVOIC")));

etc...

Schema Packs

In an effort to simplify the processing of UN/EDIFACT Interchanges, we have created tools to generate EDIFACT schema packs from the official UN/EDIFACT message definition zip directories. The generated schema packs are deployed to a public Maven repository from where users can easily access the EDIFACT schemas for the UN/EDIFACT message sets they need to support.

Schema packs are available for most of the UN/EDIFACT directories. These are available from the Maven Snapshot and Central repositories and can be added to your application using standard Maven dependency management.

As an example, to add the D93A DFDL schema pack to your application classpath, add the following dependency to your application’s POM:

pom.xml
<!-- The mapping model sip set for the D93A directory... -->
<dependency>
    <groupId>org.smooks.cartridges.edi</groupId>
    <artifactId>edifact-schemas</artifactId>
    <classifier>d93a</classifier>
    <version>2.0.0-RC1</version>
</dependency>

Once you add an EDIFACT schema pack set to the application’s classpath, you configure Smooks to use the schemas by referencing the root schema in schemaURI attribute of the edifact:parser or edifact:unparser configuration (<version>/EDIFACT-Messages.dfdl.xsd):

smooks-config.xml
<?xml version="1.0"?>
<smooks-resource-list xmlns="https://www.smooks.org/xsd/smooks-2.0.xsd"
                      xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-1.0.xsd">

    <edifact:parser schemaURI="/d03b/EDIFACT-Messages.dfdl.xsd">
        <edifact:messages>
            <edifact:message>ORDERS</edifact:message>
            <edifact:message>INVOIC</edifact:message>
        </edifact:messages>
    </edifact:parser>

</smooks-resource-list>

See the EDIFACT examples for further reference.

Maven Coordinates

pom.xml
<dependency>
    <groupId>org.smooks.cartridges.edi</groupId>
    <artifactId>smooks-edifact-cartridge</artifactId>
    <version>2.0.0-RC1</version>
</dependency>

XML Namespaces

xmlns:edifact="https://www.smooks.org/xsd/smooks/edifact-2.0.xsd"

LICENSE

Smooks EDI & EDIFACT Cartridges are open source and licensed under the terms of the Apache License Version 2.0, or the GNU Lesser General Public License version 3.0 or later. You may use Smooks EDI & EDIFACT Cartridges according to either of these licenses as is most appropriate for your project.

SPDX-License-Identifier: Apache-2.0 OR LGPL-3.0-or-later