/datatidy

Primary LanguagePHPMIT LicenseMIT

Datatidy

Datatidy is a data wrangler application based on the Symfony 4 framework. It can take one or more datasources from public APIs, make some transformations and deliver the result to a datastore.

Installation

Build assets by running

FONTAWESOME_NPM_AUTH_TOKEN='your-fontawesome-token' \
  docker compose run --env FONTAWESOME_NPM_AUTH_TOKEN --env NPM_CONFIG_USERCONFIG=.npmrc.install node yarn install
docker compose run node yarn build

Create and edit .env.local as needed to override defaults in .env and install the code by running

composer install --no-dev --classmap-authoritative
bin/console cache:clear
bin/console doctrine:migrations:migrate --no-interaction
composer dump-env prod

Getting started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.

Prerequisites

Installing

See Development on Mac if you're developing on a Mac.

docker compose pull
docker compose up --detach

docker compose exec phpfpm composer install
docker compose exec phpfpm bin/console doctrine:migrations:migrate --no-interaction

# Note: We need a custom userconfig file and an environment variables to authenticate when installing Font Awesome Pro.
FONTAWESOME_NPM_AUTH_TOKEN='your-fontawesome-token' \
  docker compose run --env FONTAWESOME_NPM_AUTH_TOKEN --env NPM_CONFIG_USERCONFIG=.npmrc.install node yarn install
docker compose run node yarn dev

Use

docker compose run node yarn watch

to watch for changes (hit Ctrl+C to kill the process).

Create a user:

docker compose exec phpfpm bin/console fos:user:create

# Super admin user
docker compose exec phpfpm bin/console fos:user:create --super-admin

Open the site in your default browser:

open "http://$(docker compose port nginx 80)"

Jobs

Start the queue consumer:

docker compose exec phpfpm bin/console messenger:consume async

Produce some jobs:

docker compose exec phpfpm bin/console datatidy:data-flow:produce-jobs

Running the tests

docker compose exec -e APP_ENV=test phpfpm bin/console doctrine:migrations:migrate --no-interaction
docker compose exec phpfpm bin/phpunit

Add SYMFONY_DEPRECATIONS_HELPER=disabled to hide deprecation notices:

docker compose exec -e SYMFONY_DEPRECATIONS_HELPER=disabled phpfpm bin/phpunit

Note: a symlink with an absolute target is created when installing symfony/phpunit-bridge, but this causes trouble if you want to run tests outside docker. To make the link relative, run:

ln -sf ../../../../../vendor/symfony/phpunit-bridge bin/.phpunit/phpunit-7.5-0/vendor/symfony/

Alternatively, run with Symfony binary (clears your database):

APP_ENV=test symfony console doctrine:migrations:migrate --no-interaction
APP_ENV=test symfony php bin/phpunit

See Data flow tests for details on how to test data flows.

UI tests

docker compose exec -e APP_ENV=test phpfpm bin/console doctrine:migrations:migrate --no-interaction
docker compose exec -e APP_ENV=test phpfpm bin/console hautelook:fixtures:load --purge-with-truncate --no-interaction
docker compose exec phpfpm vendor/bin/behat

Deployment

You will need an environment where the following is present:

  • PHP 7.3
  • Composer 1.9 or above.
  • MariaDB 10.3.17.
  • NGINX (Config example)
  • Redis 5 or above.
  • Yarn 1.17.3 or above.

Distribute the app to a place where NGINX can serve it from.

Create a .env.local file where you set the following variables:

APP_ENV=prod
APP_SECRET=some-very-secret-string-which-is-not-the-same-as-in-.env

SITE_URL=some-url.com
SITE_NAME=Name

DEFAULT_LOCALE=da

DATABASE_URL=mysql://user:pass@url:port/database
DATABASE_SERVER_VERSION='mariadb-10.3.17'

MAILER_URL=smtp://url:port
MAILER_FROM_EMAIL=info@example.com
MAILER_FROM_NAME=Info

MESSENGER_TRANSPORT_DSN=redis://url:port/messages

Note: If you're running multiple instances of Datatidy with the same redis server, you must use different stream names (messages in the MESSENGER_TRANSPORT_DSN example above), e.g.

# First Datatidy instance
MESSENGER_TRANSPORT_DSN=redis://url:port/datatidy-1-messages

and

# Another Datatidy instance
MESSENGER_TRANSPORT_DSN=redis://url:port/datatidy-2-messages

Install the dependencies and build the assets:

# Install the dependencies
composer install --no-dev --classmap-authoritative
FONTAWESOME_NPM_AUTH_TOKEN='your-fontawesome-token' \
  docker run -v ${PWD}:/app -w /app -e FONTAWESOME_NPM_AUTH_TOKEN -e NPM_CONFIG_USERCONFIG=.npmrc.install \
  node:latest yarn install
docker run -v ${PWD}:/app -w /app node:latest yarn build

yarn install --production

# Build the assets
yarn build

# Create the database and run the migrations
php bin/console doctrine:database:create --no-interaction
php bin/console doctrine:migrations:migrate --no-interaction

Want more? See the official Symfony 4.3 documentation section about deployment.

Terms and condition

Create the file misc/terms/content.html.twig with your terms and condition.

Jobs

Consumer

In order to have jobs processed the queue consumer has to be running. You probably want something to watch that the process is running all the time, and take an action if it doesn't. You could use Supervisor as this something with the following settings added:

[datatidy:consumer]
process_name=%(program_name)s_%(process_num)02d
command=/usr/bin/env php path/to/datatidy/bin/console consume async
autostart=true
autorestart=true
numprocs=1
redirect_stderr=true
stdout_logfile=path/to/output.file

Producer

You'll need to run the producer every minute to create jobs the consumer can process. You could for example use cron with the following settings to run the producer every minute:

* * * * * /usr/bin/env php path/to/datatidy/bin/console datatidy:data-flow:produce-jobs > path/to/output.file

Handling long running jobs

Sometimes and for different reasons a job may run for a long time. And because jobs only can be created if there is no other active jobs for a DataFlow, you need to set those jobs in a non-active state. To help you accomplish this a command is available:

*/30 * * * * /usr/bin/env php /path/to/datatidy/bin/console datatidy:data-flow:timeout-jobs --timeout-threshold=30 > path/to/output.file

Documentation

Documentation is kept in the doc folder.

Contributing

Before opening a Pull Request, make sure that our coding standards are followed:

# PHP
# Check to see if any violations is found:
docker compose exec phpfpm composer check-coding-standards
docker compose exec phpfpm vendor/bin/phan --allow-polyfill-parser

# You can see if the tools can fix them for you:
docker compose exec phpfpm composer apply-coding-standards

# Twig
# Only checks for violations.
docker compose exec phpfpm composer check-coding-standards/twigcs

# CSS, SCSS and JS
docker run -v ${PWD}:/app itkdev/yarn:latest check-coding-standards
docker run -v ${PWD}:/app itkdev/yarn:latest apply-coding-standards

Pull Request Process

  1. Update the README.md with details of changes that are relevant.
  2. You may merge the Pull Request in once you have the sign-off of one other developer, or if you do not have permission to do that, you may request the reviewer to merge it for you.

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Loading fixtures

docker compose exec phpfpm bin/console hautelook:fixtures:load --purge-with-truncate --no-interaction

Running a flow

The datatidy:data-flow:run console command can run a data flow by name or id:

docker compose exec phpfpm bin/console datatidy:data-flow:run --help

Development on Mac

Too speed up development on a Mac, you can use the Symfony Local Web Server.

Install the symfony binary to get started.

Starting the show

docker compose up --detach
symfony composer install
symfony console doctrine:migrations:migrate --no-interaction
symfony console hautelook:fixtures:load --no-interaction
symfony local:server:start --daemon
symfony open:local

Running tests

symfony php bin/phpunit