G-Research/spark-dgraph-connector

Provide confguration of default language

Opened this issue · 0 comments

Given a default language or a sequence of them would allow sources to pick only a single string value for multi-language predicates that have a @lang directive. Querying for <predicate>@lang1:lang2:. would pick lang1 first, then lang2, then no language, finally any other language. This make the connector pick certain languages when reading from Dgraph. Predicate name should then not include any language tags. This would work with any source and mode.

This should then allow:

spark.read.option("dgraph.language", "en").dgraph.triples(target).show(false)
subject predicate objectString objectType
1 title Star Wars: Episode IV - A New Hope string
3 title Star Wars: Episode V - The Empire Strikes Back string
6 title Star Wars: Episode VI - Return of the Jedi string
spark.read.option("dgraph.language", "zh").dgraph.triples(target).show(false)
subject predicate objectString objectType
1 title 星際大戰四部曲:曙光乍現 string
3 title 星際大戰五部曲:帝國大反擊 string
6 title 星際大戰六部曲:絕地大反攻 string
spark.read.option("dgraph.language", "de").dgraph.triples(target).show(false)
subject predicate objectString objectType
1 title Krieg der Sterne string
3 title Das Imperium schlägt zurück string
6 title Die Rückkehr der Jedi-Ritter string