/marklogic-data-hub-poc

Proof-of-Concept utilizing MarkLogic Data Hub to address asset usage statistics within Ad campaigns.

Primary LanguageJavaScript

MarkLogic Data Hub Proof-of-Concept

Problem: Data surrounding creative, related campaigns, and analytics are held in separate systems, resources, and institutional knowledge

Solution

To address the key problem set the MarkLogic Data Hub will be utilized to aggregate content from multiple systems. MarkLogic is an Enterprise NoSQL solution that allows users to model their domains flexibly and efficiently. The system allows for multiple schemas to be used at a single time. Additionally, individual schemas can be altered to fit your needs without needing to rebuild the entire database. The MarkLogic Data Hub framework is a quick start application to aggregate and map data. This can be leveraged to create data flows that address the key problem set. The MarkLogic Document Store, Graph Store, and Search will be utilized to meet the success criteria. Documents will be modeled in a Envelop pattern wrapping the original contents aid in the maintenance data provenance. These documents will be enriched with Semantic Triples to create a relationship graph. Finally, a search will be constructed to show the results.

Success Criteria

  • Index metadata regarding creative assets
  • Index performance analytics from social networks (i.e. Facebook, Twitter, Instagram)
  • Relate the assets to the analytics that are gathered from the various social networks
  • Denormalize asset and campaign data into a single aggregated document.
  • Search asset metadata and display all aggregate counts on a given asset for the campaign.

Prerequisites

Set-up

  • The application is not distributed with MarkLogic, MarkLogic Converters, or the MarkLogc Data Hub quick start. Please download and copy the files to their respective folders under marklogic and data-hub-quick-start
  • Two environmental variables will need to be set to have MarkLogic start appropriately. ML_USER and ML_PASS will be used to configure the server's admin account. The admin account will be used for configuration deployment and access to the Data Hub application.
  • Three entries should be added to your operating hosts file pointing to localhost. datahub.local, grove.local and marklogic.local. This is needed since the docker containers will communicate across a bridged network and reference the connection property in the gradle properties file.
  • Within the data-hub solution create a gradle properties file data-hub-config\gradle-local.properties. This should have two props matching your env variables mlUsername and mlPassword. Do not commit this file. It is intended for local development only.
  • To generate some data for the application utilize the ad-data-generator. The application is pre-configured to generate content in the sample-data directory. It can be run by executing the gradle command gradle bootRun

Deployment

  • Within the root folder execute docker-compose up to build all the images and deploy.
  • Access http://marklogic.local:8001, http://datahub.local:8080, and http://grove.local:9003 to verify that all applications have started.
  • For the initial Data Hub deploy execute the gradle mlDeploy command from the data-hub-config application folder.
  • For the initial Search UI deploy execute the gradle mlDeploy command from the search-ui application folder.
  • To generate data within the sample-data directory execute the gradle bootRun command from the ad-data-generator
  • To load data log into the data hub and go to Flows, execute each Ad flow first then the Asset flow.

Notes

  • The docker configuration will run the MarkLogic Data Hub starter within a container. The container will have a shared volume within the project so configurations can be exported. This may require permissions for your Docker configuration.