nerdocs/pydifact

How to implement EDIFACT Syntax elements

nerdoc opened this issue · 25 comments

I just want to gather a few ideas on how to implement interchange syntax elements like UNB, UNH, UNZ etc.: Messages, Control Header, and other things.
There are syntax implemetation guidelines of the UNECE, and other sources in the internet, like here

Best thing would be to write some sort of "syntax description language" which describes these syntax elements, and parse them. There are a few possibilities for the notation of this "syntax description":

  • external files (YAML, TOML, CSV etc.)
  • python classes

I had an idea writing these descriptions as classes, like Django does this in it's ORM, describing a syntax element by a class using attributes:

class CharType(Enum):
    ALPHA = 0
    ALPHANUMERIC = 1
    NUMERIC = 2

class SyntaxElement:
    """Common description of an EDIFACT syntax element"""
    def __init__(
        self,
        id: str,
        mandatory: bool = False,
        type: CharType = None,
        length: int = None,
        max_length: int = None,
    ):
        # checks...

        self.mandatory = mandatory
        self.type = type
        self.max = False
        self.length = 0
        if length:
            self.length = length
        elif max_length:
            self.length = max_length
            self.max = True

The first bytes of an Interchange Control Header would be then:

class InterchangeControlHeader:
    syntax_identifier = SyntaxElement(
        id="0001", mandatory=True, type=CharType.ALPHA, length=4
    )
    syntax_version_number = SyntaxElement(
        id="0002", mandatory=True, type=CharType.ALPHANUMERIC, length=1
    )
    service_code_list_directory_version_number = SyntaxElement(
        id="0080", type=CharType.ALPHANUMERIC, max_length=6
    )
    character_encoding = SyntaxElement(
        id="0133", type=CharType.ALPHANUMERIC, max_length=3
    )

@JocelynDelalande what do you think about that?

LGTM ! (good idea to implement it as python classes IMHO, that will be the simplest way).

A Simple note on Wording : SyntaxElement must be called Component to respect EDIFACT wording.

Do you want to use those classes to build the API ? (replacing dict-list current approach) or merely to do data validation, leaving an API with same style for Component and Elements level ?

Yeah, I just created a buzz-word - could be named better. Component - hm dunno. I found a list of "service codes" here: https://www.stylusstudio.com/edifact/40102/codelist.htm
Maybe ServiceCode could be appropriate too?
In the first place I just thought about data validation, not API. OMG, EDIFACT is complicated, Why did I start this project again ;-)
I think the API should be done via other structures. #19 seems a better place to discuss that, thanks for that. So, let's just keep this here in mind for validation.

It would be good to create something like SegmentTables which define how messages are structured. Here we have to include segment groups (for loops) too, which can be nested. I am still not sure if doing that using Python classes is the best way. or if just lists of e.g. tuples would be better...

From your link, "example of segment groups for the Extended Payment Order (PAYEXT)" - first group.

[
    ("UNH", "Message header", "M", 1),
    ("BGM", "Beginning of Message", "M", 1),
    ("BUS", "Business function", "C", 1),
    # ...
]

This looks a bit cleaner for creating these descriptions. But we would have to write another parser which handles nested groups etc. This is easier done in Python classes, but it is more boilerplate code to write when defining structures.
I like Django very much, and came up with this idea from how Django creates database tables in it's ORM just from Python class declarations which describe the data object. This would be similar here.
I don't know, @JocelynDelalande what do you think?

I like Django very much, and came up with this idea from how Django creates database tables in it's ORM just from Python class declarations which describe the data object. This would be similar here.
I don't know, @JocelynDelalande what do you think?

I do not really have any strong opinion on that, sorry.

Do you know if there is an official downloadable data format for the implementation of those fields available anywhere? Something like a xml or even CSV which could be parsed?

Use of django classes is a good idea. Another alternative could be to use yml file where we specify the options. The nesting could be well taken care of by the yml files.

YAML would be an option, but I definitely want to keep pydifact as clean (few dependencies) as possible. So "Django"-like classes will be the way to go.

Where can i ask a questions. ??

@srinirokz Here, if its related to this issue. Else just start a new issue.

I asked www.stylusstudio.com where they have a good overview of EDIFACT data if I can parse their site using beautifulsoup and extract Service Code data from it. This would make it much easier to get a bigger amount of data for Service codes.
If they don't agree (which I think that will be the case), I'll asked the official UN source too by mail. I nowhere can see if the EDIFACT data are protected by patents etc. or if one can use them freely.

Hi people 👋 , I’ve been poking at pydifact today and am I right in reading this issue is about adding support for definitions, which would describe things like segment groups, etc and smartly reading/writing? Am I correct that without this it’s up to users or the library to split groups?

Yes, that's right. I just don't have time ATM to implement this, so development has slowed down a bit. But nevertheless, PRs are more than welcome... I'd like to create a high level API to read and write documents, including groups.

Cool, I just wanted to check that I wasn't fundamentally misunderstanding something :)

I spent far longer than I would like to admit yesterday getting a handle on a project, dealing with malformed sample edifact files, reading specs, etc. and by the end of the day I was going a bit cross-eyed 😅

Practically I wasn't "budgeting" for time to add support, but I may be down the path far enough that a PR or two will come this way over the next few days.

Oh, and malformed sample edifact files? Please, just correct them, any help welcome. I stumbled into creating that library just by the need of a good lib in Python, and lack of that. I found good code in PHP, and transcoded it, learning EDIFACT on-the-fly along. So don't expect that this lib is from a company with many years of experience in EDIFACT. I'm a medical doctor, and coding is done in my free time ;-)
But: I'm dedicated to use this in a medical project - and therefore quality should be in the first place.

I've had a bit of time to work on this today. I'm not super happy with it, but it's a rough proof of concept.

I've tried to avoid touching the core library so far; Component is effectively just a wrapper for Segment. SegmentGroup, and SegmentLoop are higher level classes used to describe the file format.

For the samples I've got it working relatively well to read, and it looks a little something like this at the moment.

class OrderLine(SegmentGroup):
    line_id = Component("LIN", mandatory=True)
    description = Component("IMD", mandatory=True)
    quantity = Component("QTY", mandatory=True)
    moa = Component("MOA")
    pri = Component("PRI")
    rff = Component("RFF", mandatory=True)


class Order(SegmentGroup):
    purchase_order_id = Component("BGM", mandatory=True)
    date = Component("DTM", mandatory=True)
    delivery_date = Component("DTM", mandatory=True)
    delivery_instructions = Component("FTX", mandatory=True)
    supplier_id_gln = Component("NAD", mandatory=True)
    supplier_id_tprg = Component("NAD", mandatory=True)
    ref = Component("RFF", mandatory=True)
    ship_to = Component("NAD", mandatory=True)

    ship_to_contact = Component("CTA", mandatory=True)
    ship_to_phone = Component("COM", mandatory=True)
    ship_to_email = Component("COM", mandatory=True)
    cux = Component("CUX", mandatory=True)
    tdt = Component("TDT", mandatory=True)

    lines = SegmentLoop(
        OrderLine,
        max=99,
        mandatory=True
    )

    uns = Component("UNS", mandatory=True)
    cnt = Component("CNT", mandatory=True)


TYPE_TO_PARSER_DICT = {
    "ORDERS": Order
}


for message in interchange.get_messages():
    cls = TYPE_TO_PARSER_DICT.get(message.type)
    if not cls:
        raise NotImplementedError("Unsupported message type '{}'".format(message.type))

    obj = cls()
    obj.from_message(message)
    print(obj)

I'm still tinkering, so any input over the API, or suggestions on whether or not I should touch the core library would be appreciated.

I asked www.stylusstudio.com where they have a good overview of EDIFACT data if I can parse their site using beautifulsoup and extract Service Code data from it. This would make it much easier to get a bigger amount of data for Service codes.
If they don't agree (which I think that will be the case), I'll asked the official UN source too by mail. I nowhere can see if the EDIFACT data are protected by patents etc. or if one can use them freely.

The UN publish their work openly and its free to use. EDIFACT outputs are limited a little, in comparison to their reference data models from what I can get from the tooling used to maintain the libraries.

I co-ordinate the Transport and Logistics domain in UN/CEFACT so if you need some things to support this work I can maybe help? Just give me the Wishlist I'll see what I have.

Keep up the good work on Pydifact I've used it for some work and find it very good!

Hey @cmsdroff, thanks for the words. ATM Pydifact is just low level. At some time I want to get a hiver level API, but am uncertain how to do this.
I need EDIFACT for medical data exchange, which is still quite common in e.g. Austria, mainly used for doctors' reports exchanges from laboratory or specialists to general practitioners.
I am very busy ATM, so pydifact is just held on it's current status from my side for the moment - accepting PRs happily. On the long term It definitely will be used by myself, so there's no way of abandon it.

What would be helpful is a list of higher level definitions that are used in the world - everything I found so far is a bit clumsy and not really helpful for me at least...

sabas commented

@nerdoc I mantain https://github.com/php-edifact and I just started using your library to check if I can do something like the work I am doing in PHP.
If you need something you can ask me as well!
For example I converted the UN/CEFACT schemas in XML (edifact-mapping project), and I experimented in converting into json-schemas...

nerdoc commented

Interesting. Yes, this would be cool. How did you implement/convert the schemas? Manually?

sabas commented

I wrote a converter from the xml (https://github.com/php-edifact/edifact-mapping) to a schema, I am thinking of releasing it someway... If you want to write me an email I can send you something :-)

nerdoc commented

Sent you a mail - but it came back ;-)
Remote host said: 550-5.7.26 This mail has been blocked because the sender is unauthenticated.

nerdoc commented

I got your message, but when I replied, same error.

64.233.167.26 failed after I sent the message.
Remote host said: 550-5.7.26 This mail has been blocked because the sender is unauthenticated.
550-5.7.26 Gmail requires all senders to authenticate with either SPF or DKIM.
sabas commented

@nerdoc I sent you my contact details via mail

nerdoc commented

I didn't get any... sorry.

sabas commented

Tried from the work mail now :)