thomsonreuters/AtomOntologyCompleter

Numbers not matched in predicateRegex?

Closed this issue · 2 comments

Sorry for bothering you again. Being on version 0.5.1, I try to convert this file doremus.ttl with

cat doremus.ttl | python convert.py "http://data.doremus.org/ontology#" mus > doremus.json 

but the resulting json looks like this:

[
   {
      "description": "CLU198i should be container of", 
      "descriptionMoreURL": "http://data.doremus.org/ontology#CLI", 
      "text": "CLI"
   }, 
   {
      "description": "CLU197_should_have_binding", 
      "descriptionMoreURL": "http://data.doremus.org/ontology#CLU", 
      "text": "CLU"
   }, 
   {
      "description": "CLU197i should be binding of", 
      "descriptionMoreURL": "http://data.doremus.org/ontology#CLU", 
      "text": "CLU"
   }, 
…

As you see, the URL and the text values are stripped before any number, resulting in a long list of just CLU's, U's, and M's instead of the full class names.

Maybe this is somehow connected to issue #2 , or, more probably, I again missed something?

Just tried to change the second group of the predicateRegex from

compile_regex(r'^([^:\s]*):([a-zA-Z\-_]+)')

to

compile_regex(r'^([^:\s]*):([\w]+)')

what gives the expected result:

[
   {
      "description": "CLU198i should be container of", 
      "descriptionMoreURL": "http://data.doremus.org/ontology#CLI198i_should_be_container_of", 
      "text": "CLI198i_should_be_container_of"
   }, 
   {
      "description": "CLU197_should_have_binding", 
      "descriptionMoreURL": "http://data.doremus.org/ontology#CLU197_should_have_binding", 
      "text": "CLU197_should_have_binding"
   }, 
   {
      "description": "CLU197i should be binding of", 
      "descriptionMoreURL": "http://data.doremus.org/ontology#CLU197i_should_be_binding_of", 
      "text": "CLU197i_should_be_binding_of"
   }, 

No bother at all! I'm glad it's useful for you and grateful for the feedback. Your regex is correct. I had naively never encountered an ontology with numerics in the predicate name so used a more aggressive regex.

This is fixed in commit dfd29d4

Thank you, runs perfectly now.