This project processes JSON files containing medical data about patients and encounters, extracts relevant information, and stores it in a SQLite database. The script is implemented in Python and runs within a Docker environment, leveraging asynchronous operations for enhanced performance.
- Docker
- Docker Compose
- Git (optional, for cloning the repository)
emis_task/
: Main directory for the Python scripts and JSON data.main.py
: Main script for processing JSON files.tests/
: Contains pytest test files.exa-data-eng-assessment/data/
: Directory holding JSON data files.
pyproject.toml
&poetry.lock
: Python project and dependency management files.Dockerfile
: Configuration for building the Docker image.docker-compose.yml
: Configuration for orchestrating the Docker container.
-
Clone the Repository (optional if you have the files locally):
git clone https://github.com/jayhere1/data_task.git cd data_task
-
Build the Docker Image: Navigate to the project directory and build the Docker image:
docker-compose build
To start the application using Docker Compose:
docker-compose up json_processor
This command starts the Docker container json_processor
, which processes the JSON files located in the ./emis_task/exa-data-eng-assessment/data/
directory and writes the output to the SQLite database.
To run the tests within the Docker environment:
docker-compose run --rm test_service
This command runs the test suite specified in the tests/
directory, ensuring your processing logic is functioning as expected.
The SQLite database is stored in a volume that persists data even after the container is stopped. To access or manage the SQLite database:
- Use any SQLite client to connect to the database at
./sqlite_db/processed_data.db
. - example: alexcvzz.vscode-sqlite on VS code
To view the logs generated by the Docker container, execute:
docker-compose logs
To stop and remove the containers, networks, and volumes created by docker-compose up
, use:
docker-compose down -v
- Adjust the volume mounts in the
docker-compose.yml
if you need different directories for JSON data and the database. - Ensure the Docker environment is appropriately configured for your system in terms of memory and disk space, especially if running for extended periods.