Confusing naming of `Message`
Closed this issue · 13 comments
What is called Message
in pydifact is not a message in EDIFACT sense of the term.
This is realy confusing.
As pydifact does not currently handle Message
Interchange
and Functional group
objects.
Here are some ways how pydifact.message.Message
could be renamed IMHO :
pydifact.document.Document
: "document" word as no meaning in EDIFACTpydifact.SegmentCollection
would be the more accurate
I know this is painful (because breaking retro-compatibility).
I could be interested (I have my own interest into doing that) into implementing those containers into pydifact, but this bug is a blocker for that.
This is a good idea. backwards compatibility is no problem, we are at v0.0.5, and it's stated as "eats-your-dog" software. So yes, Document is a great idea.
I don't have time just now - If you want you can try hat @JocelynDelalande ? I would need a few weeks maybe until I find time again for that change. If you make it faster, just tell me (if you start on that) - so we don't do it across... ;-)
Oh, and implementing that containers - no problem - please tell me a bit about them (and maybe your interests?) - I had a vague concept about implementing containers - I just want to check if yours is compatible with mine...
Yes, I will definitively give it a shot ; I have a data structure scheme already in mind :).
But thinking about it, I wonder wether it is a good idea to keep the the Document
(current Message
) notion or not. Rationale : the Interchange and Message wrapping is mandatory in EDIFACT, there will never be a group of segments "out in the wild".
WDYT on this point @nerdoc ?
Message/Document is not a nessicity, no. As stated, I just "transcoded" pydifact from a PHP lib which was IMHO very structured, so I took the structures too. The original php library AFAIK went away from this abstracted OOP syntax and went to a (faster) procedural parsing.
I am not inclined following this approach, as I think that, although it may be a bit faster, this is not the main purpose of a python lib. If I need speed on a low-end device, I'd have to use C/C++ etc. anyway, and no language with a GC and a runtime engine.
But, again, Message (Document) is not strictly necessary as object.
OTOH: I read the standard you linked above - and (please correct me) as far as I understand, this means that a (official) "message" means the segments between the message header (i.e. UNA) and the message trailer (UNT). So, e.g.
UNA...
UNH
{Message}
UNS
UNT
meaning "message" is the inner part, not the whole thing, right?
An interchange of data in the context of EDIFACT, is composed of
one or more messages containing segments which in turn are made
up of data elements.
Maybe "Interchange" would be the best name then?
In OSI terms, a connection could include one or more EDIFACT
interchanges, each separated from the other by control service
segments which identify the start and end of each interchange.
Within each interchange, there is then a hierarchical structure
which allows for both control and identification of data for
processing. This structure is shown in Section 6 of ISO 9735.
In an UN/EDIFACT interchange,
everything from the first character of the Interchange Control
Header segment to the last character of the Interchange Control
Trailer segment, is user data,...
I found another implementation doc from Microsoft here:
Interchange
Group
Transaction set/message
Segment
Data Element
Sub Element
So Interchange
IMHO would be a good name for the whole thing?
So
Interchange
IMHO would be a good name for the whole thing?
👍 that is mandatory, and there should not be more than 1, so Interchange is the good 1st-level element.
meaning "message" is the inner part, not the whole thing, right?
Yes
And speaking about references, I found http://www.gxs.co.uk/wp-content/uploads/tutorial_edifact.pdf to be a good introduction, and it has nice schematics. Like :
(but be carefull, even if it is OK for most of it, it does not always use the exact accurate words to name things)
This is a great picture, never saw this that clearly. I even didn't know that there could be more than one messages in a group until a few days ago.
Hm. I think about testing.
def test_get_segment():
message = Message.from_segments([Segment("36CF", 1), Segment("36CF", 2)])
segment = message.get_segment("36CF")
assert Segment("36CF", 1) == segment
Here Envelope
or Interchange
makes no sense. Even if under testing umbrella everything is a bit different.
But here it doesn't need to be an Envelope, nor a complete Interchange. in this context Message
is just a sequence of Segment
s - which in fact never exists in real life.
Under this circumstances your idea SegmentCollection
would be a good name, or Sequence
.
@JocelynDelalande your opinion?
Yes, let's use SegmentCollection
: Sequence
would give the false impression that it is a word with a meaning in EDIFACT world, which is untrue. btw, this should also be the same Envelope
or AbstractSegmentContainer
from #19, no ?
I renamed it to SegmentCollection.