difi/dcat-harvester

Konvertering/Import av data inn i ElasticSearch

Closed this issue · 9 comments

hoyum commented
Konvertering/Import av data inn i ElasticSearch

Will need to investigate first how to get the data from Fuseki into Elasticsearch and then depending on the kinds of queries we expect eventual users to carry out I can then define field types and so forth. What do Fuseki dumps look like?

Given that we will have an atom/RSS feed it's possible we could use this as the source for Elasticsearch. Looking a bit into this.

hoyum commented

It seems the Atom/Rss feed might be a bit to basic. I would be ok to have all of the relevant data from the datasets into ElasticSearch

Håvard and I had a little discussion about this and like the idea of having the logic for retrieving, processing and sending data to Elasticsearch within the harvest app. Would essentially be a java process that executes SPARQL against Fuseki and then sends JSON documents to Elasticsearch.

We can use a SPARQL construct query to generate JSON-LD data. Then use a custom frame to generate the equivalent of pure JSON.

Key tasks:

  • Handle creation and update (of data source metadata) of data source index
  • Handle deletion of data source index
  • Handle update of data source index contents i.e. documents

Once this is handled, then can continue with #49

Looking into the issues connecting to Elasticsearch. Experiencing the same issues when trying to connect to Elasticsearch to create Kibana dashboards on the fly.

Connection to Elasticsearch now correctly remains open until indexing is complete and is carried out using bulk indexing. Can probably close this issue.