RMLio/rmlmapper-java

StringIndexOutOfBoundsException error with rmlmapper-6.0.0

JBPressac opened this issue · 4 comments

Hello,
Using rmlmapper-6.0.0-r363-all.jar with the following RML file, I get the following error message while rmlmapper-5.0.0-r362-all.jar generates successfully the triplets :

The error message:

09:37:19.737 [main] ERROR be.ugent.rml.cli.Main               .main(436) - begin 0, end -1, length 9
java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 9
        at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
        at java.base/java.lang.String.substring(String.java:1874)
        at be.ugent.rml.cli.Main.main(Main.java:197)
        at be.ugent.rml.cli.Main.main(Main.java:45)

The RML file:

prefixes:
 # La validité de ces prefixes ainsi que les URI des propriétés et classes utilisées restent à vérifier
 # Par exemple, crm:P98i_was_born ou crm:P98_was_born ?
 bibo: http://purl.org/ontology/bibo/
 bnf-onto: http://data.bnf.fr/ontology/bnf-onto/
 crm: http://www.cidoc-crm.org/cidoc-crm/
 crm-sup: https://ontome.net/ns/sdh-crm-supplement/
 foaf: http://xmlns.com/foaf/0.1/
 frbroo: http://iflastandards.info/ns/fr/frbr/frbroo/
 grel: http://users.ugent.be/~bjdmeest/function/grel.ttl# 
 idlab-fn: http://example.com/idlab/function/
 idref: https://www.idref.fr/
 isni: http://isni.org/ontology#
 lexvo: http://lexvo.org/id/iso639-3/
 owl: http://www.w3.org/2002/07/owl#  
 prelib: http://mshb.huma-num.fr/prelib/
 sdh: https://ontome.net/ns/sdhss/
 sdh-int: https://ontome.net/ns/intellectual-literary-life
 sdh-so: https://ontome.net/ns/social-life-specific/
 wd: http://www.wikidata.org/entity/ 
 xsd: http://www.w3.org/2001/XMLSchema#

variables:
  access: &dbHost http://localhost/database
  type: &dbType mysql
  referenceFormulation: &csvReferenceFormulation csv
  credentials: &credentials
    username: $(_username)
    password: $(_password)

sources:
 prelib_airegeographique:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : SELECT * FROM prelib_airegeographique; 
 prelib_appellationpersonne:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : > 
   SELECT * FROM prelib_appellationpersonne 
   JOIN prelib_langue ON prelib_langue.id = prelib_appellationpersonne.langue_id; 
 prelib_collectif:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : SELECT * FROM prelib_collectif;
 prelib_collectifecritoeuvre:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query: >
   SELECT (CASE WHEN date_ecriture REGEXP '^\\d+$' THEN date_ecriture END) AS annee, 
   prelib_collectifecritoeuvre.* FROM prelib_collectifecritoeuvre 
   JOIN prelib_fonctionecritoeuvre ON prelib_fonctionecritoeuvre.id = prelib_collectifecritoeuvre.fonction_id  
   ORDER BY prelib_collectifecritoeuvre.id
 prelib_ecritoeuvre:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  # date_ecriture est de type VARCHAR et contient parfois des années, parfois des intervales d'années, parfois des dates au format français
  # parfois des commentaires. Pour l'instant, on se contente de récupérer les saisies conformes à la la REGEXP '^\\d+$'. 
  # Puisque prelib_ecritoeuvre contient des relations avec d'autres roles que les auteurs, date_ecriture devrait être considéré 
  # comme la date d'exercice du rôle et non pas la date d'écriture à proprement parler.
  # A FAiRE : Voir s'il ne faudrait tout simplement pas retirer cette information des triplets.
  # Creation des champs suivants :
  # - annee
  query: >
   SELECT (CASE WHEN date_ecriture REGEXP '^\\d+$' THEN date_ecriture END) AS annee, 
   prelib_ecritoeuvre.* FROM prelib_ecritoeuvre 
   JOIN prelib_fonctionecritoeuvre ON prelib_fonctionecritoeuvre.id = prelib_ecritoeuvre.fonction_id  
   ORDER BY prelib_ecritoeuvre.id   
 prelib_editeoeuvre:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  query : SELECT * FROM prelib_editeoeuvre;
 prelib_editerevue:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  query : SELECT * FROM prelib_editerevue;  
 prelib_edition:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  query : SELECT * FROM prelib_edition;
 prelib_oeuvre:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : SELECT * FROM prelib_oeuvre;
 prelib_oeuvreedition:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : SELECT * FROM prelib_oeuvreedition;  
 prelib_oeuvrelangue:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : > 
   SELECT * FROM prelib_oeuvrelangue JOIN prelib_langue 
   WHERE prelib_langue.id = prelib_oeuvrelangue.langue_id;
 prelib_paraitdans:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : SELECT * FROM prelib_paraitdans;    
 prelib_personne:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  # Creation des champs suivants :
  # - date_naissance = date de naissance complète yyyy-mm-dd ou yyyy
  # - date_naissance_dtype = date ou gYear
  # - idem pour date_deces et date_deces_dtype
  # - FRBNF = numéro de notice BnF
  # - gender = Male ou Female
  query: >
   SELECT (CASE WHEN (jour_naissance IS NOT NULL AND mois_naissance IS NOT NULL AND annee_naissance IS NOT NULL) 
   THEN CONCAT_WS('-', LPAD(`annee_naissance`, 4, 0), LPAD(`mois_naissance`, 2, 0), LPAD(`jour_naissance`, 2, 0)) 
   WHEN (annee_naissance IS NOT NULL) THEN annee_naissance END) AS date_naissance, 
   (CASE WHEN (jour_naissance IS NOT NULL AND mois_naissance IS NOT NULL AND annee_naissance IS NOT NULL) 
   THEN 'date' WHEN (annee_naissance IS NOT NULL) THEN 'gYear' END) AS date_naissance_dtype, 
   (CASE WHEN (jour_deces IS NOT NULL AND mois_deces IS NOT NULL AND annee_deces IS NOT NULL) 
   THEN CONCAT_WS('-', LPAD(`annee_deces`, 4, 0), LPAD(`mois_deces`, 2, 0), LPAD(`jour_deces`, 2, 0)) 
   WHEN (annee_deces IS NOT NULL) THEN annee_deces END) AS date_deces, 
   (CASE WHEN (jour_deces IS NOT NULL AND mois_deces IS NOT NULL AND annee_deces IS NOT NULL) 
   THEN 'date' WHEN (annee_deces IS NOT NULL) THEN 'gYear' END) AS date_deces_dtype,   
   SUBSTRING(ark_bnf, 9, CHAR_LENGTH(ark_bnf) - 9) AS FRBNF, 
   IF(sexe = 'M', REPLACE(sexe, 'M', 'male'),IF(sexe = 'F', REPLACE(sexe, 'F', 'female'),'')) AS gender, 
   prelib_personne.* FROM prelib_personne ORDER BY id ;
 prelib_profession:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  query : SELECT * FROM prelib_profession;
 prelib_revue:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation
  query : SELECT * FROM prelib_revue;
 prelib_ville:
  access: *dbHost
  credentials: *credentials
  type: *dbType
  referenceFormulation: *csvReferenceFormulation   
  query : SELECT * FROM prelib_ville;    

# A FAIRE : 
# - Mettre les triplets dans des graphes, généred des N-Quads
# - Modifier PRELIB pour ajouter les identifiants Geonames des villes
# - Editer PRELIB pour verser le champ 'intitule' de prelib_profession dans qualite_id (prelib_qualiteparticipecollectif)
# - Exprimer les informations de prelib_ecritoeuvre avec les ontologies utilisées par la BnF et l'Abes

# Ne faudrait-il pas distinguer la notice PRELIB de ce dont parle la notice comme le fait IdRef 
# et la BnF ? ex. sur data.bnf.fr : DESCRIBE <http://data.bnf.fr/ark:/12148/cb16623239r#about>
# qui comprend le triplet :
# <http://data.bnf.fr/ark:/12148/cb16623239r> <http://xmlns.com/foaf/0.1/focus> <http://data.bnf.fr/ark:/12148/cb16623239r#about>

mappings:
# Inspiré de data.bnf.fr : DESCRIBE <http://data.bnf.fr/ark:/12148/cb16623239r#about>
# Voir aussi Théodore Hersart de la Villemarqué http://data.bnf.fr/ark:/12148/cb119103377#about et http://www.idref.fr/026956845/id
# Voir aussi Michel Serres http://www.idref.fr/02713329X/id
# Problèmes : https://github.com/RMLio/rmlmapper-java/issues/159 (RML Mapper ne tient pas compte des colonnes vides de type VARCHAR)
 PersonneMapping:
  sources: prelib_personne
  subjects: prelib:personne/$(id)
  predicateobjects:
   - [a, crm:E21_Person]
   - [rdfs:label, $(nom_usuel)]
   - [foaf:name, $(nom_usuel)]
   - [foaf:familyName, $(nom_etat_civil)]
   - [foaf:givenName, $(prenom_etat_civil)]
   - [bnf-onto:FRBNF, $(FRBNF)]
   - [isni:identifierValid, $(isni)]
   - [owl:sameAs, http://www.wikidata.org/entity/$(wikidata)~iri]
   - [owl:sameAs, http://www.idref.fr/$(idref)~iri]
   - [owl:sameAs, http://data.bnf.fr/ark:/$(ark_bnf)#foaf:Person~iri]
   - [owl:sameAs, http://isni.org/isni/$(isni)~iri]
   - [owl:sameAs, http://viaf.org/viaf/$(viaf)~iri]
   - [crm-sup:P20_same_as_URI, http://www.wikidata.org/entity/$(wikidata)~iri]
   - [crm-sup:P20_same_as_URI, http://www.idref.fr/$(idref)~iri]
   - [crm-sup:P20_same_as_URI, http://data.bnf.fr/ark:/$(ark_bnf)#foaf:Person~iri]
   - [crm-sup:P20_same_as_URI, http://isni.org/isni/$(isni)~iri]
   - [crm-sup:P20_same_as_URI, http://viaf.org/viaf/$(viaf)~iri]
   - [foaf:gender, $(gender), en~lang]
   - p: crm:P1_is_identified_by
     o:
      - mapping: AppellationPersonneMapping
        condition:
         function: equal
         parameters:
          - [str1, $(id)]
          - [str2, $(personne_id)]
   - p: crm:P98_was_born
     o:
      - mapping: BirthMapping
        condition:
         function: equal
         parameters:
          - [str1, $(id)]
          - [str2, $(id)]
   - p: crm:P100_died_in 
     o:
      - mapping: DeathMapping
        condition:
         function: equal
         parameters:
          - [str1, $(id)]
          - [str2, $(id)]           

 # Avec la proposition de Vincent et Francesco, il n'y a pas de dictinction entre les formes retenues et les formes rejettées
 # blank nodes (ressources anonymes)
 AppellationPersonneMapping:
  sources: prelib_appellationpersonne
  predicateobjects:
  - [a, crm:E41_Appellation]
  - [crm-sup:P21_has_value, $(appellation), $(code_iso_639_3)~lang]

 AppellationRetenuePersonneMapping:
  sources: prelib_appellationpersonne
  subjects: prelib:personne/$(personne_id)
  predicateobjects:
   - [skos:prefLabel, $(appellation), $(code_iso_639_3)~lang]
  condition:
    function: idlab-fn:equal
    parameters:
     - [grel:valueParameter, $(forme)]
     - [grel:valueParameter2, "2"]
   
 AppellationRejetteePersonneMapping:
  sources: prelib_appellationpersonne
  subjects: prelib:personne/$(personne_id)
  predicateobjects:
   - [skos:altLabel, $(appellation), $(code_iso_639_3)~lang]
  condition:
    function: idlab-fn:equal
    parameters:
     - [grel:valueParameter, $(forme)]
     - [grel:valueParameter2, "1"]

 DeathMapping:
  sources: prelib_personne
  predicateobjects:
   - [a, crm:E69_Death]
   - p: crm:P7_took_place_at
     o:
      - mapping: VilleMapping
        condition:
         function: equal
         parameters:
          - [str1, $(ville_deces_id)]
          - [str2, $(id)]
   - p: crm:P4_has_time_span
     o:
      - mapping: TimeSpanDeathMapping
        condition:
         function: equal
         parameters:
          - [str1, $(id)]
          - [str2, $(id)]    

 BirthMapping:
  sources: prelib_personne
  predicateobjects:
   - [a, crm:E67_Birth]
   - p: crm:P7_took_place_at
     o:
      - mapping: VilleMapping
        condition:
         function: equal
         parameters:
          - [str1, $(ville_naissance_id)]
          - [str2, $(id)]
   - p: crm:P4_has_time_span
     o:
      - mapping: TimeSpanBirthMapping
        condition:
         function: equal
         parameters:
          - [str1, $(id)]
          - [str2, $(id)] 

 # yarrrml-parser ne tient pas compte de l'utilisation de la référence pour forger le datatype
 # créé une issue https://github.com/RMLio/yarrrml-parser/issues/162#issuecomment-1104892337
 TimeSpanBirthMapping:
  sources: prelib_personne
  predicateobjects:
   - [a, crm:E52_Time-Span]
   - p: crm:P82_at_some_time_within
     o: $(date_naissance)
     datatype: xsd:gYear
     condition:
      function: equal
      parameters:
       - [str1, $(date_naissance_dtype)]
       - [str2, 'gYear']
   - p: crm:P82_at_some_time_within
     o: $(date_naissance)
     datatype: xsd:date
     condition:
      function: equal
      parameters:
       - [str1, $(date_naissance_dtype)]
       - [str2, 'date']

 TimeSpanDeathMapping:
  sources: prelib_personne
  predicateobjects:
   - [a, crm:E52_Time-Span]
   - p: crm:P82_at_some_time_within
     o: $(date_deces)
     datatype: xsd:gYear
     condition:
      function: equal
      parameters:
       - [str1, $(date_deces_dtype)]
       - [str2, 'gYear']
   - p: crm:P82_at_some_time_within
     o: $(date_deces)
     datatype: xsd:date
     condition:
      function: equal
      parameters:
       - [str1, $(date_deces_dtype)]
       - [str2, 'date']

 CollectifMapping:
  sources: prelib_collectif
  subjects: prelib:collectif/$(id)
  predicateobjects:
   - [a, crm:E74_Group]
   - [rdfs:label, $(nom, fr~lang]

 EditionMapping:
  sources: prelib_edition
  subjects: prelib:edition/$(id)
  predicateobjects:
   - [a, frbroo:F3_Manifestation_Product_Type]
   - [rdfs:label, $(titre)]
   - [crm:P102_has_title, $(titre)]
   - [owl:sameAs, http://www.wikidata.org/entity/$(wikidata)~iri]
   - [owl:sameAs, http://www.sudoc.org/$(sudoc)~iri]
   - [owl:sameAs, http://data.bnf.fr/ark:/$(ark_bnf)~iri]
   - [crm-sup:P20_same_as_URI, http://www.wikidata.org/entity/$(wikidata)~iri]
   - [crm-sup:P20_same_as_URI, http://www.sudoc.org/$(sudoc)~iri]
   - [crm-sup:P20_same_as_URI, http://data.bnf.fr/ark:/$(ark_bnf)~iri]
   - [bibo:isbn10, $(isbn_10)]
   - [bibo:isbn13, $(isbn_13)]   

 VilleMapping:
  sources: prelib_ville
  subjects: prelib:ville/$(id)
  predicateobjects:
   - [a, sdh:C13_Geographical_Place]
   - [rdfs:label, $(nom), fr~lang]
   - [owl:sameAs, wd:$(wikidata)~iri]
   - [crm-sup:P20_same_as_URI, wd:$(wikidata)~iri]       

Thank you,

Dear @JBPressac , there's a small regression at v6 giving problems with command line arguments using relative paths without ./, so eg -m mapping.rml.ttl could give a problem, whilst -m ./mapping.rml.ttl should work. There's a fix on the way, but in the meantime I hope this helps.

@bjdmeest thank you for your suggestion, alas, it does not solve my problem....

Hmm, have you tried that with all CLI path arguments? also, e.g., the output path? If so, could you send us the CLI argument's you're using?

OK, you are right, I forgot to apply the ./ to the output path, sorry.