nerdocs/pydifact

Confusing naming of `Message`

Closed this issue · 13 comments

What is called Message in pydifact is not a message in EDIFACT sense of the term.

This is realy confusing.

As pydifact does not currently handle Message Interchange and Functional group objects.

Here are some ways how pydifact.message.Message could be renamed IMHO :

  • pydifact.document.Document : "document" word as no meaning in EDIFACT
  • pydifact.SegmentCollection would be the more accurate

I know this is painful (because breaking retro-compatibility).

I could be interested (I have my own interest into doing that) into implementing those containers into pydifact, but this bug is a blocker for that.

This is a good idea. backwards compatibility is no problem, we are at v0.0.5, and it's stated as "eats-your-dog" software. So yes, Document is a great idea.
I don't have time just now - If you want you can try hat @JocelynDelalande ? I would need a few weeks maybe until I find time again for that change. If you make it faster, just tell me (if you start on that) - so we don't do it across... ;-)
Oh, and implementing that containers - no problem - please tell me a bit about them (and maybe your interests?) - I had a vague concept about implementing containers - I just want to check if yours is compatible with mine...

Yes, I will definitively give it a shot ; I have a data structure scheme already in mind :).

But thinking about it, I wonder wether it is a good idea to keep the the Document (current Message) notion or not. Rationale : the Interchange and Message wrapping is mandatory in EDIFACT, there will never be a group of segments "out in the wild".

WDYT on this point @nerdoc ?

Message/Document is not a nessicity, no. As stated, I just "transcoded" pydifact from a PHP lib which was IMHO very structured, so I took the structures too. The original php library AFAIK went away from this abstracted OOP syntax and went to a (faster) procedural parsing.
I am not inclined following this approach, as I think that, although it may be a bit faster, this is not the main purpose of a python lib. If I need speed on a low-end device, I'd have to use C/C++ etc. anyway, and no language with a GC and a runtime engine.

But, again, Message (Document) is not strictly necessary as object.
OTOH: I read the standard you linked above - and (please correct me) as far as I understand, this means that a (official) "message" means the segments between the message header (i.e. UNA) and the message trailer (UNT). So, e.g.

UNA...
  UNH
    {Message}
  UNS
UNT

meaning "message" is the inner part, not the whole thing, right?

An interchange of data in the context of EDIFACT, is composed of
one or more messages containing segments which in turn are made
up of data elements.

Maybe "Interchange" would be the best name then?

In OSI terms, a connection could include one or more EDIFACT
interchanges, each separated from the other by control service
segments which identify the start and end of each interchange.
Within each interchange, there is then a hierarchical structure
which allows for both control and identification of data for
processing. This structure is shown in Section 6 of ISO 9735.

In an UN/EDIFACT interchange,
everything from the first character of the Interchange Control
Header segment to the last character of the Interchange Control
Trailer segment, is user data,...

I found another implementation doc from Microsoft here:

Interchange  
   Group  
      Transaction set/message  
         Segment  
            Data Element  
               Sub Element

So InterchangeIMHO would be a good name for the whole thing?

So InterchangeIMHO would be a good name for the whole thing?

👍 that is mandatory, and there should not be more than 1, so Interchange is the good 1st-level element.

meaning "message" is the inner part, not the whole thing, right?

Yes

And speaking about references, I found http://www.gxs.co.uk/wp-content/uploads/tutorial_edifact.pdf to be a good introduction, and it has nice schematics. Like :

image

(but be carefull, even if it is OK for most of it, it does not always use the exact accurate words to name things)

This is a great picture, never saw this that clearly. I even didn't know that there could be more than one messages in a group until a few days ago.

Hm. I think about testing.

def test_get_segment():
    message = Message.from_segments([Segment("36CF", 1), Segment("36CF", 2)])
    segment = message.get_segment("36CF")
    assert Segment("36CF", 1) == segment

Here Envelope or Interchange makes no sense. Even if under testing umbrella everything is a bit different.

But here it doesn't need to be an Envelope, nor a complete Interchange. in this context Message is just a sequence of Segments - which in fact never exists in real life.

Under this circumstances your idea SegmentCollection would be a good name, or Sequence.
@JocelynDelalande your opinion?

Yes, let's use SegmentCollection : Sequence would give the false impression that it is a word with a meaning in EDIFACT world, which is untrue. btw, this should also be the same Envelope or AbstractSegmentContainer from #19, no ?

I renamed it to SegmentCollection.

Ah, yes, #19 should be the same. Let's talk there further.