LibreCat/Catmandu-MARC

closing collection tag is missing

Closed this issue · 4 comments

  • Catmandu::MARC 1.254
  • Catmandu 1.2015

im converting a huge XMLMARC file and while the input file has both an opening and closing tag the output only has the opening tag.

catmandu convert  MARC --type XML to MARC --type XML < test.marc.xml > test.fixed.marc.xml

input

<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://www.loc.gov/MARC21/slim
http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<record>
...
</record>
</collection>

output

<?xml version="1.0" encoding="UTF-8"?>
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim">
<marc:record>
...
</marc:record>
jorol commented

I can't reproduce this on my machine with same versions of modules. Could you please check the output of this conversion?

$ curl -s 'https://raw.githubusercontent.com/LibreCat/Catmandu-MARC/dev/t/marc.xml' | catmandu convert MARC --type XML to MARC --type XML

You can try to force the collection tags with option --collection 1:

$ catmandu convert  MARC --type XML to MARC --type XML --collection 1 < test.marc.xml > test.fixed.marc.xml

See https://metacpan.org/pod/Catmandu::Exporter::MARC::XML

I just shortened the import file from from 142k records to 200 records and now I do see the closing collection tag.

but with the large file it is still missing and catmandu doesnt complain or throw an error.

Something with the input file causes catmandu to silently create an invalid output file.
What could be the reason for that behaviour?

jorol commented

Could you please validate your XML import file?

$ xmllint --noout --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd test.marc.xml
$ yaz-marcdump -n -i marcxml test.marc.xml

there was an error in the input file

xmllint --noout --schema http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd test.xml
test.xml:127779: element datafield: Schemas validity error : Element '{http://www.loc.gov/MARC21/slim}datafield': Missing child element(s). Expected is ( {http://www.loc.gov/MARC21/slim}subfield ).
test.xml fails to validate
<datafield tag="245" ind1="0" ind2="0">^M<!-- Feld3500 -->  </datafield>

after removing this line and re-running catmandu convert the closing tag is there.

thanks for the help!