StringIndexOutOfBoundsException error with rmlmapper-6.0.0
JBPressac opened this issue · 4 comments
JBPressac commented
Hello,
Using rmlmapper-6.0.0-r363-all.jar with the following RML file, I get the following error message while rmlmapper-5.0.0-r362-all.jar generates successfully the triplets :
The error message:
09:37:19.737 [main] ERROR be.ugent.rml.cli.Main .main(436) - begin 0, end -1, length 9
java.lang.StringIndexOutOfBoundsException: begin 0, end -1, length 9
at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
at java.base/java.lang.String.substring(String.java:1874)
at be.ugent.rml.cli.Main.main(Main.java:197)
at be.ugent.rml.cli.Main.main(Main.java:45)
The RML file:
prefixes:
# La validité de ces prefixes ainsi que les URI des propriétés et classes utilisées restent à vérifier
# Par exemple, crm:P98i_was_born ou crm:P98_was_born ?
bibo: http://purl.org/ontology/bibo/
bnf-onto: http://data.bnf.fr/ontology/bnf-onto/
crm: http://www.cidoc-crm.org/cidoc-crm/
crm-sup: https://ontome.net/ns/sdh-crm-supplement/
foaf: http://xmlns.com/foaf/0.1/
frbroo: http://iflastandards.info/ns/fr/frbr/frbroo/
grel: http://users.ugent.be/~bjdmeest/function/grel.ttl#
idlab-fn: http://example.com/idlab/function/
idref: https://www.idref.fr/
isni: http://isni.org/ontology#
lexvo: http://lexvo.org/id/iso639-3/
owl: http://www.w3.org/2002/07/owl#
prelib: http://mshb.huma-num.fr/prelib/
sdh: https://ontome.net/ns/sdhss/
sdh-int: https://ontome.net/ns/intellectual-literary-life
sdh-so: https://ontome.net/ns/social-life-specific/
wd: http://www.wikidata.org/entity/
xsd: http://www.w3.org/2001/XMLSchema#
variables:
access: &dbHost http://localhost/database
type: &dbType mysql
referenceFormulation: &csvReferenceFormulation csv
credentials: &credentials
username: $(_username)
password: $(_password)
sources:
prelib_airegeographique:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_airegeographique;
prelib_appellationpersonne:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : >
SELECT * FROM prelib_appellationpersonne
JOIN prelib_langue ON prelib_langue.id = prelib_appellationpersonne.langue_id;
prelib_collectif:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_collectif;
prelib_collectifecritoeuvre:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query: >
SELECT (CASE WHEN date_ecriture REGEXP '^\\d+$' THEN date_ecriture END) AS annee,
prelib_collectifecritoeuvre.* FROM prelib_collectifecritoeuvre
JOIN prelib_fonctionecritoeuvre ON prelib_fonctionecritoeuvre.id = prelib_collectifecritoeuvre.fonction_id
ORDER BY prelib_collectifecritoeuvre.id
prelib_ecritoeuvre:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
# date_ecriture est de type VARCHAR et contient parfois des années, parfois des intervales d'années, parfois des dates au format français
# parfois des commentaires. Pour l'instant, on se contente de récupérer les saisies conformes à la la REGEXP '^\\d+$'.
# Puisque prelib_ecritoeuvre contient des relations avec d'autres roles que les auteurs, date_ecriture devrait être considéré
# comme la date d'exercice du rôle et non pas la date d'écriture à proprement parler.
# A FAiRE : Voir s'il ne faudrait tout simplement pas retirer cette information des triplets.
# Creation des champs suivants :
# - annee
query: >
SELECT (CASE WHEN date_ecriture REGEXP '^\\d+$' THEN date_ecriture END) AS annee,
prelib_ecritoeuvre.* FROM prelib_ecritoeuvre
JOIN prelib_fonctionecritoeuvre ON prelib_fonctionecritoeuvre.id = prelib_ecritoeuvre.fonction_id
ORDER BY prelib_ecritoeuvre.id
prelib_editeoeuvre:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_editeoeuvre;
prelib_editerevue:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_editerevue;
prelib_edition:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_edition;
prelib_oeuvre:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_oeuvre;
prelib_oeuvreedition:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_oeuvreedition;
prelib_oeuvrelangue:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : >
SELECT * FROM prelib_oeuvrelangue JOIN prelib_langue
WHERE prelib_langue.id = prelib_oeuvrelangue.langue_id;
prelib_paraitdans:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_paraitdans;
prelib_personne:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
# Creation des champs suivants :
# - date_naissance = date de naissance complète yyyy-mm-dd ou yyyy
# - date_naissance_dtype = date ou gYear
# - idem pour date_deces et date_deces_dtype
# - FRBNF = numéro de notice BnF
# - gender = Male ou Female
query: >
SELECT (CASE WHEN (jour_naissance IS NOT NULL AND mois_naissance IS NOT NULL AND annee_naissance IS NOT NULL)
THEN CONCAT_WS('-', LPAD(`annee_naissance`, 4, 0), LPAD(`mois_naissance`, 2, 0), LPAD(`jour_naissance`, 2, 0))
WHEN (annee_naissance IS NOT NULL) THEN annee_naissance END) AS date_naissance,
(CASE WHEN (jour_naissance IS NOT NULL AND mois_naissance IS NOT NULL AND annee_naissance IS NOT NULL)
THEN 'date' WHEN (annee_naissance IS NOT NULL) THEN 'gYear' END) AS date_naissance_dtype,
(CASE WHEN (jour_deces IS NOT NULL AND mois_deces IS NOT NULL AND annee_deces IS NOT NULL)
THEN CONCAT_WS('-', LPAD(`annee_deces`, 4, 0), LPAD(`mois_deces`, 2, 0), LPAD(`jour_deces`, 2, 0))
WHEN (annee_deces IS NOT NULL) THEN annee_deces END) AS date_deces,
(CASE WHEN (jour_deces IS NOT NULL AND mois_deces IS NOT NULL AND annee_deces IS NOT NULL)
THEN 'date' WHEN (annee_deces IS NOT NULL) THEN 'gYear' END) AS date_deces_dtype,
SUBSTRING(ark_bnf, 9, CHAR_LENGTH(ark_bnf) - 9) AS FRBNF,
IF(sexe = 'M', REPLACE(sexe, 'M', 'male'),IF(sexe = 'F', REPLACE(sexe, 'F', 'female'),'')) AS gender,
prelib_personne.* FROM prelib_personne ORDER BY id ;
prelib_profession:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_profession;
prelib_revue:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_revue;
prelib_ville:
access: *dbHost
credentials: *credentials
type: *dbType
referenceFormulation: *csvReferenceFormulation
query : SELECT * FROM prelib_ville;
# A FAIRE :
# - Mettre les triplets dans des graphes, généred des N-Quads
# - Modifier PRELIB pour ajouter les identifiants Geonames des villes
# - Editer PRELIB pour verser le champ 'intitule' de prelib_profession dans qualite_id (prelib_qualiteparticipecollectif)
# - Exprimer les informations de prelib_ecritoeuvre avec les ontologies utilisées par la BnF et l'Abes
# Ne faudrait-il pas distinguer la notice PRELIB de ce dont parle la notice comme le fait IdRef
# et la BnF ? ex. sur data.bnf.fr : DESCRIBE <http://data.bnf.fr/ark:/12148/cb16623239r#about>
# qui comprend le triplet :
# <http://data.bnf.fr/ark:/12148/cb16623239r> <http://xmlns.com/foaf/0.1/focus> <http://data.bnf.fr/ark:/12148/cb16623239r#about>
mappings:
# Inspiré de data.bnf.fr : DESCRIBE <http://data.bnf.fr/ark:/12148/cb16623239r#about>
# Voir aussi Théodore Hersart de la Villemarqué http://data.bnf.fr/ark:/12148/cb119103377#about et http://www.idref.fr/026956845/id
# Voir aussi Michel Serres http://www.idref.fr/02713329X/id
# Problèmes : https://github.com/RMLio/rmlmapper-java/issues/159 (RML Mapper ne tient pas compte des colonnes vides de type VARCHAR)
PersonneMapping:
sources: prelib_personne
subjects: prelib:personne/$(id)
predicateobjects:
- [a, crm:E21_Person]
- [rdfs:label, $(nom_usuel)]
- [foaf:name, $(nom_usuel)]
- [foaf:familyName, $(nom_etat_civil)]
- [foaf:givenName, $(prenom_etat_civil)]
- [bnf-onto:FRBNF, $(FRBNF)]
- [isni:identifierValid, $(isni)]
- [owl:sameAs, http://www.wikidata.org/entity/$(wikidata)~iri]
- [owl:sameAs, http://www.idref.fr/$(idref)~iri]
- [owl:sameAs, http://data.bnf.fr/ark:/$(ark_bnf)#foaf:Person~iri]
- [owl:sameAs, http://isni.org/isni/$(isni)~iri]
- [owl:sameAs, http://viaf.org/viaf/$(viaf)~iri]
- [crm-sup:P20_same_as_URI, http://www.wikidata.org/entity/$(wikidata)~iri]
- [crm-sup:P20_same_as_URI, http://www.idref.fr/$(idref)~iri]
- [crm-sup:P20_same_as_URI, http://data.bnf.fr/ark:/$(ark_bnf)#foaf:Person~iri]
- [crm-sup:P20_same_as_URI, http://isni.org/isni/$(isni)~iri]
- [crm-sup:P20_same_as_URI, http://viaf.org/viaf/$(viaf)~iri]
- [foaf:gender, $(gender), en~lang]
- p: crm:P1_is_identified_by
o:
- mapping: AppellationPersonneMapping
condition:
function: equal
parameters:
- [str1, $(id)]
- [str2, $(personne_id)]
- p: crm:P98_was_born
o:
- mapping: BirthMapping
condition:
function: equal
parameters:
- [str1, $(id)]
- [str2, $(id)]
- p: crm:P100_died_in
o:
- mapping: DeathMapping
condition:
function: equal
parameters:
- [str1, $(id)]
- [str2, $(id)]
# Avec la proposition de Vincent et Francesco, il n'y a pas de dictinction entre les formes retenues et les formes rejettées
# blank nodes (ressources anonymes)
AppellationPersonneMapping:
sources: prelib_appellationpersonne
predicateobjects:
- [a, crm:E41_Appellation]
- [crm-sup:P21_has_value, $(appellation), $(code_iso_639_3)~lang]
AppellationRetenuePersonneMapping:
sources: prelib_appellationpersonne
subjects: prelib:personne/$(personne_id)
predicateobjects:
- [skos:prefLabel, $(appellation), $(code_iso_639_3)~lang]
condition:
function: idlab-fn:equal
parameters:
- [grel:valueParameter, $(forme)]
- [grel:valueParameter2, "2"]
AppellationRejetteePersonneMapping:
sources: prelib_appellationpersonne
subjects: prelib:personne/$(personne_id)
predicateobjects:
- [skos:altLabel, $(appellation), $(code_iso_639_3)~lang]
condition:
function: idlab-fn:equal
parameters:
- [grel:valueParameter, $(forme)]
- [grel:valueParameter2, "1"]
DeathMapping:
sources: prelib_personne
predicateobjects:
- [a, crm:E69_Death]
- p: crm:P7_took_place_at
o:
- mapping: VilleMapping
condition:
function: equal
parameters:
- [str1, $(ville_deces_id)]
- [str2, $(id)]
- p: crm:P4_has_time_span
o:
- mapping: TimeSpanDeathMapping
condition:
function: equal
parameters:
- [str1, $(id)]
- [str2, $(id)]
BirthMapping:
sources: prelib_personne
predicateobjects:
- [a, crm:E67_Birth]
- p: crm:P7_took_place_at
o:
- mapping: VilleMapping
condition:
function: equal
parameters:
- [str1, $(ville_naissance_id)]
- [str2, $(id)]
- p: crm:P4_has_time_span
o:
- mapping: TimeSpanBirthMapping
condition:
function: equal
parameters:
- [str1, $(id)]
- [str2, $(id)]
# yarrrml-parser ne tient pas compte de l'utilisation de la référence pour forger le datatype
# créé une issue https://github.com/RMLio/yarrrml-parser/issues/162#issuecomment-1104892337
TimeSpanBirthMapping:
sources: prelib_personne
predicateobjects:
- [a, crm:E52_Time-Span]
- p: crm:P82_at_some_time_within
o: $(date_naissance)
datatype: xsd:gYear
condition:
function: equal
parameters:
- [str1, $(date_naissance_dtype)]
- [str2, 'gYear']
- p: crm:P82_at_some_time_within
o: $(date_naissance)
datatype: xsd:date
condition:
function: equal
parameters:
- [str1, $(date_naissance_dtype)]
- [str2, 'date']
TimeSpanDeathMapping:
sources: prelib_personne
predicateobjects:
- [a, crm:E52_Time-Span]
- p: crm:P82_at_some_time_within
o: $(date_deces)
datatype: xsd:gYear
condition:
function: equal
parameters:
- [str1, $(date_deces_dtype)]
- [str2, 'gYear']
- p: crm:P82_at_some_time_within
o: $(date_deces)
datatype: xsd:date
condition:
function: equal
parameters:
- [str1, $(date_deces_dtype)]
- [str2, 'date']
CollectifMapping:
sources: prelib_collectif
subjects: prelib:collectif/$(id)
predicateobjects:
- [a, crm:E74_Group]
- [rdfs:label, $(nom, fr~lang]
EditionMapping:
sources: prelib_edition
subjects: prelib:edition/$(id)
predicateobjects:
- [a, frbroo:F3_Manifestation_Product_Type]
- [rdfs:label, $(titre)]
- [crm:P102_has_title, $(titre)]
- [owl:sameAs, http://www.wikidata.org/entity/$(wikidata)~iri]
- [owl:sameAs, http://www.sudoc.org/$(sudoc)~iri]
- [owl:sameAs, http://data.bnf.fr/ark:/$(ark_bnf)~iri]
- [crm-sup:P20_same_as_URI, http://www.wikidata.org/entity/$(wikidata)~iri]
- [crm-sup:P20_same_as_URI, http://www.sudoc.org/$(sudoc)~iri]
- [crm-sup:P20_same_as_URI, http://data.bnf.fr/ark:/$(ark_bnf)~iri]
- [bibo:isbn10, $(isbn_10)]
- [bibo:isbn13, $(isbn_13)]
VilleMapping:
sources: prelib_ville
subjects: prelib:ville/$(id)
predicateobjects:
- [a, sdh:C13_Geographical_Place]
- [rdfs:label, $(nom), fr~lang]
- [owl:sameAs, wd:$(wikidata)~iri]
- [crm-sup:P20_same_as_URI, wd:$(wikidata)~iri]
Thank you,
bjdmeest commented
Dear @JBPressac , there's a small regression at v6 giving problems with command line arguments using relative paths without ./
, so eg -m mapping.rml.ttl
could give a problem, whilst -m ./mapping.rml.ttl
should work. There's a fix on the way, but in the meantime I hope this helps.
bjdmeest commented
Hmm, have you tried that with all CLI path arguments? also, e.g., the output path? If so, could you send us the CLI argument's you're using?
JBPressac commented
OK, you are right, I forgot to apply the ./ to the output path, sorry.