Jefferson49/ExtendedImportExport

checking syntax when using Gedcom Validator

Closed this issue · 11 comments

When exporting a gedcom and checking the syntax using Gedcom Validator, I see several warnings the gedcom is not compatible to Gedcom 7...

Thank you for reporting the issue!

To understand the background, it would help to get some information about the related GEDCOM data and errors. Could you post some examples of the warning messages of Gedcom Validator? Could you also post some examples of GEDCOM snippets, which are related to the warnings?

If possible from privacy point of view, you could also send me your exported GEDCOM file (or some parts, which cause errors) as email to webmaster(at)familienforschung-hemprich.de

I think the easiest way to check the syntaxis of the Gedcom file is installing it from https://chronoplexsoftware.com/gedcomvalidator/
Some error messages:
0 @M502087@ OBJE
1 FILE import/photo.emf
2 FORM EMF -> should be: 2 FORM image/emf
3 TYPE PHOTO

Also
2 FORM GIF
or
2 FORM HTM
are not valid.

All custom Gedcom tags (starting with "_" should be declared:
1 SCHMA
2 TAG _AKA https://website.com/_AKA.html

For now it's too much to give you more examples, may be later.

Best regards,
Marianne

Regarding the image type, currently, the following types are supported:
bmp|BMP, gif|GIF, jpg|JPG, tif|TIF, pdf|PDF

Therefore, GIF should already work.

It is no problem to add further types, like EMF or HTML. The allowed media types in GEDCOM 7 can be found in the following source: https://www.iana.org/assignments/media-types/media-types.xhtml

Just post a list with the missing types to include, e.g.
EMF => image/emf
HTM|HTML => text/html

It would be great if you could also check in the IANA link list if the media type is available.

All custom Gedcom tags (starting with "_" should be declared:
1 SCHMA
2 TAG _AKA https://website.com/_AKA.html

This is more difficult, since I do not have a list of all custom tags and schema definitions and I do not even know if such a list exists.

What I could do is to scan the whole Gedcom text if a certain custom tags occurs; and include the schemas if the related custom tag is found somewhere.

If you can provide me a list with custom tags and schemas, I can try to include this. I would need something like the following examples from GEDCOM-L:
'_GODP' => 'https://genealogy.net/GEDCOM/',
'_GOV' => 'https://genealogy.net/GEDCOM/',
'_GOVTYPE' => 'https://genealogy.net/GEDCOM/',
'_LOC' => 'https://genealogy.net/GEDCOM/',
'_NAME' => 'https://genealogy.net/GEDCOM/',
'_POST' => 'https://genealogy.net/GEDCOM/',
'_RUFNAME' => 'https://genealogy.net/GEDCOM/',
'_STAT' => 'https://genealogy.net/GEDCOM/',
'_UID' => 'https://genealogy.net/GEDCOM/',
'_WITN' => 'https://genealogy.net/GEDCOM/',

During testing, I identifyed that GIF and JPG ist not always exported correctly. I fixed this issue and also added an export for emf, htm, html.

In the attachement, I added an updated file for GedcomSevenExportService.php. You can unzip and replace this file in your installation and check, if the export now works for emf, gif etc.

GedcomSevenExportService.zip

All custom Gedcom tags (starting with "_" should be declared:

At https://wiki.genealogy.net/GEDCOM/_Nutzerdef-Tag#Tabelle_1, I found a list of GEDCOM custom tags, which seems to cover a lot of the known custom tags.

In the latest code of the module, I generate SCHMA structure based on this custom tags list. If a custom tag from this list is detected during download, a SCHMA is included in the export:

Example:

1 SCHMA
2 TAG _NOTH https://wiki.genealogy.net/GEDCOM/_Nutzerdef-Tag#Tabelle_1

In the attachement, you can find a module version, which includes this functionality.

Can you test it if it works for your purposes?

download_gedcom_with_url_v3.2.3_238f1278.zip

In my tree are several custom tags. A lot of these tags were not found by your module. Do you want to receive a list of these tags?

Since the simple approach with the custom tag list from https://wiki.genealogy.net/GEDCOM/_Nutzerdef-Tag#Tabelle_1 did not cover all of your custom tags, I started to rethink about this issue.

I read the GEDCOM 7 specification for extensions and found the following:

  • "Each extTag is either a documented extension tag or an undocumented extension tag"
  • "An extension tag that is not given a URI in the schema structure is called an undocumented extension tag. The meaning of an undocumented extension tag is identified by its superstructure type and its tag."

I my opinion, the specification text implies that undocumented extension tags are also a part of the standard and are possible to be included in a GEDCOM file.

Therefore, GEDCOM Validator is too strict about the SCHMA structure and the error messages should be warnings or information.

I created an issue at GEDCOM Validator to change the error. Hopefully, this will be changed.

Regarding the DownloadGedcomWithURL module, I will wait what happens with the issue at GEDCOM Validator. My summary for the moment is that I will only created SCHMA sctructures for custom tags, where a dedicated URI with a specific description of the custom tag is available. This seems to be in line with the intention of the GEDCOM 7 specification.

Also I duplicated some tags by using both the regular tag and Gedcom 7 tag, e.g. RELA and ROLE.

Your module changed the tag RELA to ROLE, so when using your module and check the syntax of the Gedcom with GedcomValidator, it reports an error of duplicate ROLE.

Well, GEDCOM 7 eliminated the RELA tags and all related structures need to be converted to ROLE.

Can you provide me a GEDCOM snippet (from webtrees, i.e. GEDCOM 5.5.1) with your usage of RELA/ROLE, which creates an error after conversion to GEDCOM 7. I will check if I can change the code to support a conversion.

I created an issue at GEDCOM Validator to change the error. Hopefully, this will be changed.

GEDCOM Validator did not accept to change the validation.

At https://wiki.genealogy.net/GEDCOM/_Nutzerdef-Tag#Tabelle_1, I found a list of GEDCOM custom tags, which seems to cover a lot of the known custom tags.

The latest release generates all the schemas for custom tags from the list above. Like described above, it is not an error to have further custom tags without schema. If you want to add a schema for those tags, you might want to add the following GEDCOM lines for each of your custom tags without schema. The idea is to refer to the GEDCOM 7 specification if no other URL ist available, which described the custom tag.

1 SCHMA
2 TAG _TAG https://gedcom.io/specifications/FamilySearchGEDCOMv7.pdf

Release 3.2.4 addresses most of the reported issues abouve.