/ts-edifact

Typescript port of the node-edifact project

Primary LanguageTypeScriptApache License 2.0Apache-2.0

view on npm npm module downloads per month

ts-edifact

ts-edifact is an Edifact parsing library written in typescript. This implementation was initially based on node-edifact but has changed a bit since the start of the project. It now is able to build a full object tree based on the general message structure defined in the Edifact documentation and update the tokenizer with the the appropriate charset, which was defined in the UNB segment.

By default ts-edifact ships with a selection of D:01B message structure definitions as well as their respective segment and element definition files. For convenience a parser was added recently to generate such definitions from the UNECE Web page. Plans to generate these definition files from the Edifact Directory exist, though due to time limitations this feature wasn't added yet.

Currently supported functionality:

  • An ES6 streaming parser reading UN/EDIFACT messages.
  • Provide your own event listeners to get the parser to do something useful.
  • Construct structured javascript objects from UN/EDIFACT messages.
  • Support for the UNA header and custom separators.
  • Validating data elements and components accepted by a given segment.
  • Parsing and checking standard UN/EDIFACT messages with segment tables.
  • Support for envelopes.
  • Check for well-formed Edifact documents according to the defined message type, version and revision within the UNH message header.
  • Generation of Edifact specification definition files (i.e. D01B_INVOIC.struct.json, D01B_INVOIC.segments.json and D01B_INVOIC.elements.json) obtained from the UNECE page directly.
  • On using the Reader class, updating the charset according to the one specified in the UNB segment of the interchange is supported.

Current status

We switched from ts-edifact to a currently proprietary Java-based implementation I developed in the past year. During this process I learned a lot in regards to parsing Edifact files and, sadly, the current ts-edifact version has its limitations which briefly summarized look as such:

  • The syntax version has no impact on the Edifact document. Neither charsets are correctly restricted or timestamps validated here correctly nor are service segments initialized here for the appropriate syntax version
  • Charsets (v3, v4) in UNB segments are not correctly supported which can lead to issues for UNOX and UNOY encodings that may utilize up to 4 bytes per character
  • Code-list values are not parsed and therefore not included in the validation process
  • Objects generated for the object tree do not respect the specifications in different Edifact directories
  • segments.ts and elements.ts only support basic service segments (UNB, UNH, ...), that are more or less mixing things of different syntax versions, but are missing a lot of others, especially ones defined in syntax version 4
  • Parsing and validation should be decoupled
  • More configuration options are needed
  • More sample files are needed, especially ones with uncommon charsets
  • Better mechanism for loading message structure, segments and element definition tables needed to reduce the size of this artifact and to load definitions only when really needed. Any help from more experienced JS/TS developers is more than appreciated here on how such a mechanism may look like

Changing these things, unfortunately, takes a bit time of which I'm currently not having that much of. I try to port the necessary changes to this project as soon as possible, but my schedule for the upcoming months looks pretty exhausting TBH.

Usage

This example parses a document and translates it to a javascript array result containing segments. Each segment is an object containing a name and an elements array. An element is an array of components.

import { Parser, Validator, ResultType, Configuration, ValidatorImpl } from 'ts-edifact';

const enc: string = "...";
const doc: string = "...";

const parser: Parser = new Parser(new Configuration({ validator: new ValidatorImpl() }));

// Provide some segment and element definitions.
// parser.configuration.validator.define({...});

// Parsed segments will be collected in the result array.
let result: ResultType[] = [];
let elements: string[][];
let components: string[];

parser.onOpenSegment = (segment: string): void => {
  // Started a new segment.
  elements = [];
  result.push({ name: segment, elements: elements });
};

parser.onElement = (): void => {
  // Parsed a new element.
  components = [];
  elements.push(components);
};

parser.onComponent = (value: string): void => {
  // Got a new component.
  components.push(value);
};

// by default the tokenizer of the parser will use UNOA charset
// if characters outside of those defined in the UNOA charset should be used
// the parser needs to be informed about which characters it should allow for alpha(numeric) values
parser.updateCharset("UNOC");
// send the Edifact data to the parser. This method supports streaming and can be invoked when new data
//  is available and it will continue from where it left
parser.write(doc);
// tell the parser that no more data is expected to be received
parser.end();

or more streamlined using the Reader utility class

import { Reader, ResultType } from "ts-edifact";

const document: string = "...";
const specDir: string = "...";

const reader: Reader = new Reader(specDir);
const result: ResultType[] = reader.parse(document);

// ...

or if a custom set of segment- and element definitions should be used

import { Reader, ResultType, Dictionary, SegmentEntry, ElementEntry } from "ts-edifact";

import * as segmentsData from ".../segments.json";
import * as elementsData from ".../elements.json";

const document: string = "...";
const specDir: string = "...";

const segments: Dictionary<SegmentEntry> = new Dictionary<SegmentEntry>(segmentsData);
const elements: Dictionary<ElementEntry> = new Dictionary<ElementEntry>(elementsData);

const reader: Reader = new Reader(specDir);
reader.define(segments);
reader.define(elements);

const result: ResultType[] = reader.parse(document);
// ...

Finaly, you can get the interchange (see InterchangeBuilder)

import { Separators, EdifactSeparatorsBuilder, InterchangeBuilder, Edifact } from "ts-edifact";

// ...

const separators: Separators = new EdifactSeparatorsBuilder().build();

const builder: InterchangeBuilder = new InterchangeBuilder(result, separators, "./definitions/");

const interchange: Edifact = builder.interchange;

Installation

The module can be installled through:

npm install ts-edifact

Keep in mind that this is an ES6 library. It currently can be used with node 4.0 or higher.

Overview

This module is build around a central Parser class which provides the core UN/EDIFACT parsing functionality. It only exposes four methods:

  • the updateCharset(string) method updates the charset used by the Tokenizer class to determine valid data values. If a charset is provided which the parser does not yet recognize, this method will throw an error.
  • the write() method to write some data to the parser
  • the separators()method does return the separators which are used by the parser
  • the end() method to close an EDI interchange.

Data read by the parser can be read by using hooks which will be called on specific parsing events.

Segment and element definitions

Definitions can be provided to describe the structure of segments and elements. An example of a segment definition:

{
  "BGM": {
    "requires": 0,
    "elements": ["C002", "C106", "1225", "4343"]
  }
}

A corresponding definition from the UN/EDIFACT D01A spec for the above mentioned BGM segment can be seen following this link.

Note: It may also help to use the archives to navigate the older version of unece.org using this link.

The requires property indicates the number of elements which are required to obtain a valid segment. The elements array contains the names of the elements which should be provided. Definitions can also be provided for these elements:

{
  "C002": {
    "requires": 4,
    "components": ["an..3", "an..17", "an..3", "an..35"]
  },
  "C106": {
    "requires": 3,
    "components": ["an..35", "an..9", "an..6"]
  }
}

An incomplete set of D01B definition files can be found in the src/messageSpec folder. The *.struct.json files contain the general structure definition of an Edifact message, i.e. INVOIC.struct.json contains the message structure specification of a D01B Edifact invoice, while *.segments.json contain the respective admissible segments of the acutal processed message type and the *.elements.json contain the respective component definitions of elements used by segments.

As of version 0.0.7 such definition files can be generated via the UNECEMessageStructureParser class in case the definition is available online at the unece.org page. This parser will generate a EdifactMessageSpecification object structure that holds the actual message type structe definition as well as the segment- and element tables needed to validate the document to process.

The persist utility function of the src/util class allows to persist the generated definitions to files which can be used onwards.

import { UNECEMessageStructureParser } from "ts-edifact";
import { UNECELegacyMessageStructureParser } from "ts-edifact/lib/edi/legacyMessageStructureParser";
import { EdifactMessageSpecification, MessageStructureParser } from "ts-edifact/lib/edi/messageStructureParser";
import { persist } from "ts-edifact/lib/util";

const legacy = ["93a", "94a", "94b", "95a", "95b", "96a", "96b", "97a", "97b", "98a", "98b", "99a"];

async function storeSpecFiles(specDir: string, type: string, version: string, revision: string): Promise<void> {
    const structureParser: MessageStructureParser = legacy.includes(revision)
        ? new UNECELegacyMessageStructureParser(version + revision, type) // You can use legacy for the oldest versions
        : new UNECEMessageStructureParser(version + revision, type);

    const response: EdifactMessageSpecification = await structureParser.loadTypeSpec();

    persist(response, specDir);
}
    
// ...

await storeSpecFiles("/home/SomeUser/edifact", "invoic", "d", "01b")
    .then(() => console.log("Downloaded"));

On using the Reader class it will attempt to read such specification files from either the provided directory or, if none was provided, it will try to read such definition files from the local directory. The *.segments.json and *.elements.json files are used during parsing time of the Edifact document to validate that only admissible values are provided for the respective segments/elements. By default, the ValidatorImpl class will ignore any unknown segments or element definitions found. If a strict validation should be performed, that throws an error in case an unknown segment or element is contained within the document the validator needs to be initialized with the optional throwOnMissingDefinitions parameter set to true.

// use strict validation; will throw an error if unknown segments and elements are found
const validator: Validator = new ValidatorImpl(true);

The *.struct.json file is only used in case an object structure should be generated via the InterchangeBuilder class.

Performance

Parsing speed including validation but without matching against a segment table is around 20Mbps. Around 30% of the time spent seems to be needed for the validation part.

If performance is critical the event callbacks can also be directly defined as methods on the Parser instance. Defining an event callback onOpenSegment(callback) then becomes:

const parser: Parser = new Parser();
const callback = (segment: string) => { ... };

parser.onOpenSegment = callback;

Keep in mind that this avoids any openSegment events to be produced and as such, also it's associated overhead.

Classes

Class Description
Parser The Parser class encapsulates an online parsing algorithm. By itself it doesn't do anything useful, however the parser can be extended through several event callbacks.
Reader A convenience class which assigns default callbacks to the respective event callbacks on the parser and returns an array of name- and elements entries, where name is a string referencing the segment name and elements is a multidimensional array where the outer array represents an element of the segment and the inner array will contain the respective components of an element.
Tracker A utility class which validates segment order against a given message structure.
Validator The Validator can be used as an add-on to the Parser class, to enable validation of segments, elements and components. This class implements a tolerant validator, only segments and elements for which definitions are provided will be validated. Other segments or elements will pass through untouched. Validation includes:
  • Checking data element counts, including mandatory elements.
  • Checking component counts, including mandatory components.
  • Checking components against their required format.
Counter The Counter class can be used as a validator for the Parser class. However it doesn't perform any validation, it only keeps track of segment, element and component counts. Component counts are reset when starting a new element, just like element counts are reset when closing the segment.
InterchangeBuilder The InterchangeBuilder class will use the parsed result obtained by either the reader or the parser and convert the array of segments, by using a corresponding message version definition, into a JavaScript object structure containing the respective messages contained in the parsed Edifact as well as respective segment groups which are further subgrouped by the iteration count on respective segments. I.e. if multiple LIN and accompanying segments are found, they are grouped in their own subgroup and any accompanying segment belonging to that segment group will be added to that subgroup as well.
UNECEMessageStructureParser A helper class to parse the online version of the UNECE hompeage for the respective Edifact message type structure as well as the admissible segments and elements for the respective message type
SegmentTableBuilder A builder for segment definition objects used by the validator and tracker classes
ElementTableBuilder A builder for element definition objects used by the validator and tracker classes

Reference

Parser

A parser capable of accepting data formatted as an UN/EDIFACT interchange. The constructor accepts a Configuration instance as an optional argument:

new Parser();
new Parser(configuration: Configuration);

The first constructor will initialize a NullValidator, which does not perform any validation and therefore also not throw any errors.

Function Description
onOpenSegment(segment: string): void Add a listener for a specific open segment event.
onCloseSegment(): void Add a listener for a close segment event.
onElement(): void Add a listener for starting processing an element within a segment.
onComponent(data: string): void Add a listener for a parsed component part of an Edifact element.
updateCharset(charset: string): void Specifies the character set to use while parsing the Edifact document. By default UNOA will be used. This method throws an error if an unknown charset is provided.
write(chunk) Write a chunk of data to the parser
separators() Since v0.0.7 Returns an object of the identified and used separators
end() Terminate the EDI interchange

Reader

A convenience class which already implements default callbacks for common parsing tasks.

new Reader(specDir?: string);

The optional specDir parameter should point to the location where the Edifact specification files can be found. If none was provided the reader tries to find them in the local directory.

Function Description
define(definitions: (Dictionary<SegmentEntry> | Dictionary<ElementEntry>)): void Feeds the validator used inside the reader with the set of known segment and element definitions. Parsed segments which do not adhere to the segments or elements defined in these tables will lead to a failure being thrown and therefore fail the parsing of the Edifact document.
parse(document: string): ResultType[] Will attempt to parse the document to an array of segment objects where each segment object contains a name and a further multidimensional array of strings representing the elements in the outer array and the respective components of an element in the inner array. Any validation error encountered while reading the Edifact document will lead to an error being thrown and does ending the parsing process preemptively.

Since 0.0.12 the encoding(string) function was removed as the charset is now set once the UNB header is processed. By default the parser will use the UNOA charset and update to the specified one once the respective charset was extracted.

Tracker

A utility class which validates segment order against a given message structure. The constructor accepts a segment table as its first argument:

new Tracker(table: MessageType[]);
Function Description
accept(segment: string | MessageType): void Match a segment to the message structure and update the current position of the tracker.
reset(): void Reset the tracker to the initial position of the current segment table.

Validator

The Validator can be used to validate segments, elements and components. It keeps track of element and component counts and checks if the component types match those in the segment definition.

new ValidatorImpl(throwOnMissingDefinitions?: boolean = false);

Since v0.0.7 a strict validation can be performed on setting the throwOnMissingDefinitions parameter to true which leads to failures if unknown segments or elements are used within the document under validation. Be default, the current implementation will ignore any unknown segments or elements.

Function Description
disable(): void Disable validation.
enable(): void Enable validation.
define(definitions: (Dictionary<SegmentEntry> | Dictionary<ElementEntry>)): void Provision the validator with an array of segment and element definitions.
format(formatString: string): FormatType | undefined Requests a component definition associated with a format string
onOpenSegment(segment: string): void Start validation of a new segment
onElement(): void Add an element
onOpenComponent(buffer: Tokenizer): void Open a component
onCloseComponent(buffer: Tokenizer): void Close a component
onCloseSegment(segment: string): void Finish the segment

The buffer argument to both onOpenComponent() and onCloseComponent() should provide three methods alpha(), alphanumeric(), and numeric() allowing the mode of the buffer to be set. It should also expose a length() method to check the length of the data currently in the buffer.

InterchangeBuilder

The InterchangeBuilder uses the result of the Parser or Reader to convert the array of segments into a JavaScript/TypeScript object structure representing the interchange envelop and the respective Edifact messages parsed.

It will attempt to use the specific message structure definition for the concrete version defined in the message header (UNH) to check whether a mandatory segment according to the Edifact specification is missing. A D96A invoice should therefore attempt to use the respective D96A_INVOIC message structure specification and if not obtainable fall back of a common INVOIC message struture specification.

The constructuro expects, besides the parsing result of the Edifact document a base path where the Edifact message structure definition files are located. An incomplete list can be found using the ./node_modules/ts-edifact/lib/messages/ base path.

new InterchangeBuilder(parsingResult: ResultType[], separators: Separators, basePath: string);

UNECEMessageStructureParser

Since v0.0.7: This class will parse the unece.org Website in order to generate message structure definition as well as segment- and element definitions for a requested message type.

new UNECEmessageStructureParser(version: string, type: string);
Function Description
loadTypeSpec(): Promise<EdifactMessageSpecification> Downloads the message structure definition page of a respective Edifact message type, i.e. INVOIC, and for all specified segments the respective segment definition pages. These pages will be parsed and converted to a object structure supporting the lookup of the respective entries.

The generated EdifactMessageSpecification object will hold the parsed values for the message type structure as well as the segments- and elements used by this message type.

Note that the respective segments and elements are mapped to own typescript objects which are defined in src/edifact.ts. The current set of classes is probably incomplete and may fall victim to differences between multiple versions and release tags of the Edifact specification. Consider further work to be done here in the future.

SegmentTableBuilder

A helper class to load the respective segment definition files, from either the local directory or a specified one, and convert them to a usable data structure needed by the validator and tracker classes.

new SegmentTableBuilder(type: string);
Function Description
forVersion(version: string): TableBuilder<SegmentEntry> Sets the version of the Edifact document this builder should fetch.
specLocation(location: string): TableBuilder<SegmentEntry> Sets the path where to look for the segment definition files.
build(): Dictionary<SegmentEntry> Attempts to load the *.segments.json definition file of the specified message type (and version if specified) and returns a dictionary object with the loaded data. If no file could be found only the basic UNB, UNH, UNS, UNT and UNZ segment definitions are loaded.

ElementTableBuilder

A helper class to load the respective element definition files, from either the local directory or a specified one, and convert them to a usable data structure needed by the validator and tracker classes.

new ElementTableBuilder(type: string);
Function Description
forVersion(version: string): TableBuilder<ElementEntry> Sets the version of the Edifact document this builder should fetch.
specLocation(location: string): TableBuilder<ElementEntry> Sets the path where to look for the element definition files.
build(): Dictionary<ElementEntry> Attempts to load the *.elements.json definition file of the specified message type (and version if specified) and returns a dictionary object with the loaded data. If no file could be found only the basic element definitions used by the UNB, UNH, UNS, UNT and UNZ segments are loaded.