/iudx-data-ingestion-server

Data ingestion API server to ingest data into the IUDX system

Primary LanguageJavaMIT LicenseMIT

Build Status Jenkins Coverage Unit Tests Performance Tests Security Tests Integration Tests

IUDX

iudx-data-ingestion-server

The Data Ingestion Server is the "Ingestion Firewall and Data Cleaning Middleware" of IUDX. It enables Providers and Delegates to publish data using the IUDX API as per the data descriptor using the HTTP protocol over TLS(HTTPs).

Features

  • Data Ingestion Server allows IUDX Data Providers and Delegate to publish data into the IUDX platform
  • Allows IUDX admin to register and delete ingestion stream for one or more data resources using standard APIs
  • Integrated with IUDX authorization server (token introspection) to allow data publication
  • Secure data publication over TLS.
  • Scalable, service mesh architecture based implementation using open source components: Vert.X API framework and RabbitMQ for data broker.
  • Hazelcast and Zookeeper based cluster management and service discovery.

API Docs

The api docs can be found [here] need to add the link here.

Prerequisites

External dependencies installation

The Data ingestion pipeline connects to external dependencies namely

  • RabbitMQ

Find the installations of the above along with the configurations to modify the database url, port and associated credentials in the appropriate sections here

Get Started

Make configuration

Make a config file based on the template in ./configs/config-example.json

  • Generate a certificate using Lets Encrypt or other methods
  • Make a Java Keystore File and mention its path and password in the appropriate sections
  • Modify the database url and associated credentials in the appropriate sections

Docker based

  1. Install docker and docker-compose
  2. Clone this repo
  3. Build the images ./docker/build.sh
  4. Modify the docker-compose.yml file to map the config file you just created
  5. Start the server in production (prod) or development (dev) mode using docker-compose docker-compose up prod

Maven based

  1. Install java 11 and maven
  2. Use the maven exec plugin based starter to start the server mvn clean compile exec:java@data-ingestion-server

JAR based

  1. Install java 11 and maven
  2. Set Environment variables
export DI_URL=https://<rs-domain-name>
export LOG_LEVEL=INFO
  1. Use maven to package the application as a JAR mvn clean package -Dmaven.test.skip=true
  2. 2 JAR files would be generated in the target/ directory
    • iudx.data.ingestion.server-cluster-0.0.1-SNAPSHOT-fat.jar - clustered vert.x containing micrometer metrics
    • iudx.data.ingestion.server-dev-0.0.1-SNAPSHOT-fat.jar - non-clustered vert.x and does not contain micrometer metrics

Running the clustered JAR

Note: The clustered JAR requires Zookeeper to be installed. Refer here to learn more about how to set up Zookeeper. Additionally, the zookeepers key in the config being used needs to be updated with the IP address/domain of the system running Zookeeper. The JAR requires 3 runtime arguments when running:

  • --config/-c : path to the config file
  • --hostname/-i : the hostname for clustering
  • --modules/-m : comma separated list of module names to deploy e.g. java -jar ./fatjar.jar --host $(hostname) -c configs/config.json -m iudx.data.ingestion.server.ApiServerVerticle Use the --help/-h argument for more information. You may additionally append an DI_JAVA_OPTS environment variable containing any Java options to pass to the application. e.g.
$ export RS_JAVA_OPTS="-Xmx4096m"
$ java $RS_JAVA_OPTS -jar target/iudx.data.ingestion.server-cluster-0.0.1-SNAPSHOT-fat.jar ...

Running the non-clustered JAR

The JAR requires 1 runtime argument when running:

  • --config/-c : path to the config file e.g. java -Dvertx.logger-delegate-factory-class-name=io.vertx.core.logging.Log4j2LogDelegateFactory -jar target/iudx.data.ingestion.server-dev-0.0.1-SNAPSHOT-fat.jar -c configs/config.json Use the --help/-h argument for more information. You may additionally append an RS_JAVA_OPTS environment variable containing any Java options to pass to the application. e.g.
$ export RS_JAVA_OPTS="-Xmx1024m"
$ java $RS_JAVA_OPTS -jar target/iudx.data.ingestion.server-dev-0.0.1-SNAPSHOT-fat.jar ...

Testing

Unit tests

  1. Run the server through either docker, maven or redeployer
  2. Run the unit tests and generate a surefire report mvn clean test-compile surefire:test surefire-report:report
  3. Reports are stored in ./target/

Integration tests

Integration tests are through Postman/Newman whose script can be found from here.

  1. Install prerequisites
  2. Example Postman environment can be found here
  3. Run the server through either docker, maven or redeployer
  4. Run the integration tests and generate the newman report newman run <postman-collection-path> -e <postman-environment> --insecure -r htmlextra --reporter-htmlextra-export .
  5. Reports are stored in ./target/

Contributing

We follow Git Merge based workflow

  1. Fork this repo.
  2. Create a new feature branch in your fork. Multiple features must have a hyphen separated name, or refer to a milestone name as mentioned in Github -> Projects.
  3. Commit to your fork and raise a Pull Request with upstream.

License

MIT