michaelbrunnbauer/rdf2rdb

Invalid line: -Latn-pinyin or -US

Closed this issue · 3 comments

When I try to run RDF2RDB I get a parse error, about the language.
This is the output when I run RDF2RDB:

[root@terma_aat rdf2rdb]# ./rdf2rdb.py ../aat/AATOut_2Terms.nt
parsing ../aat/AATOut_2Terms.nt
Traceback (most recent call last):
  File "./rdf2rdb.py", line 56, in <module>
    p=parser(source,output_triples=options.output_triples)
  File "/tmp/rdf2rdb/parser.py", line 19, in __init__
    graph.parse(source,format=settings.filenameextensions[format])
  File "build/bdist.linux-x86_64/egg/rdflib/graph.py", line 1352, in parse
    location=location, file=file, data=data, **args)
  File "build/bdist.linux-x86_64/egg/rdflib/graph.py", line 1002, in parse
    parser.parse(source, self, **args)
  File "build/bdist.linux-x86_64/egg/rdflib/plugins/parsers/nt.py", line 26, in parse
    parser.parse(f)
  File "build/bdist.linux-x86_64/egg/rdflib/plugins/parsers/ntriples.py", line 131, in parse
    raise ParseError("Invalid line: %r" % self.line)
ParseError: Invalid line: '-Latn-pinyin .'
saving state to rdf2rdb.pickle

How can I prevent this parse error? In my settings I have added nl as a desired language.

Looks like AATOut_2Terms.nt contains a line "-Latn-pinyin ." - which should be a syntax error for ntriples.

If you disagree send the contents of AATOut_2Terms.nt

I can't find -Latn-pinyin. in the file, but I can find it without the dot char, like so:
<http://vocab.getty.edu/aat/term/1000645078-zh-Latn-pinyin> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2008/05/skos-xl#Label> .

Should I string replace -Latn-pinyin to make the syntax valid?

I downloaded the AATOut_2Terms.nt file from http://vocab.getty.edu/ AAT explicit.zip.

I was able to reproduce the problem with rdflib 4.0.1 and AATOut_2Terms.nt from vocab.getty.edu. It is a bug in rdflib that wrongly treats language tags with upper case chars as error, e.g:

<http://vocab.getty.edu/aat/term/1000309631-zh-Latn-pinyin> <http://vocab.getty.edu/ontology#term> "yueqin"@zh-Latn-pinyin .

The newest version of rdflib does not seem to have this error (I installed it with "easy_install --upgrade rdflib" but you may want to do it different depending on your os/setup).