Bulk Action Manager is a NestJS-based application designed to handle large-scale bulk operations efficiently. It allows users to perform bulk updates on different entities like contacts, companies, leads, opportunities, and tasks. The system utilizes advanced batch processing, logging, and full-text search capabilities to manage and monitor bulk actions effectively.
The application is structured around NestJS modules, leveraging TypeORM for database interactions and RabbitMQ for handling asynchronous batch processing. The architecture is designed for scalability and robustness, with a focus on efficient handling of bulk operations and real-time monitoring.
- Bulk Actions Module: Manages bulk update operations.
- Worker Module: Handles the processing of bulk actions in batches.
- Logging Module: Captures detailed logs and statistics of bulk operations.
- Database: Uses PostgreSQL, leveraging full-text search capabilities for logs.
- Message Broker: RabbitMQ is used for queue management and ensuring reliable processing.
- Clone the Repository
git clone git@github.com:vithalreddy/bulk-action.git
- Install Dependencies
cd bulk-action
npm ci
- tested using node version 21
- Configure Environment
- create
.env
file based on.env.example
- create
- Docker Compose
docker compose up -d
- we are using docker compose to setuo postgresql and rabbitmq for simplicity
- Database Setup
- The application uses TypeORM's synchronize option for schema creation, which is suitable for development environments. This option automatically creates your database schema on every application launch.
- we have included a utility npm cmd to seed the database
npm run seed-db
- Run the application
npm run start
- Now you can access the api swagger docs @
http://localhost:3000/api/docs
here port based on .env file config value.
Before starting the application please check the file
.env.example
where we can configure our system based on resources and requirements.
The API documentation is generated using Swagger and can be accessed at /api when the application is running. This provides interactive documentation where you can test out different endpoints.
- GET /bulk-actions: List all bulk actions with pagination which has all the data related to bulk action status, stats and progress.
- POST /bulk-actions: create an bulk action.
- GET /bulk-actions/{actionId}: Retrieve the status of a specific bulk action.
- GET /bulk-actions/{actionId}/stats: Get statistics of a specific bulk action, it get's updated in near real-time.
- GET /bulk-actions/{actionId}/logs: Fetch logs related to a specific bulk action and can filter logs.
- Filters: Defined using JSON rules https://github.com/cachecontrol/json-rules-engine, filters determine which records are selected for updating. For example, a rule might target records where a field value is greater than a certain number.
- Update Process: Once the records are filtered, specified fields in these records are updated with new values. This could include changing status, modifying dates, or any other field present in the record.
Batch processing ensures efficiency and scalability in handling bulk updates:
- Batch Creation: The entire set of records is divided into smaller subsets called batches.
- Parallel Processing: Each batch is processed independently, allowing for concurrent updates. This reduces the processing time significantly compared to sequential updates.
Logs are systematically stored and retrieved using PostgreSQL's full-text search:
- Log Storage: Every significant action in the system, like the start/end of a batch or an update operation, is logged with details like timestamps and outcomes.
- Full-Text Search: PostgreSQL's full-text search capabilities enable querying logs based on text content. This is particularly useful for quickly finding specific logs among large volumes of log data.
The system is designed for horizontal scalability:
- Load Distribution: By using RabbitMQ and batching, the workload is distributed across multiple instances or nodes.
- Scalable Architecture: The architecture supports adding more worker nodes to handle increased load, ensuring the system scales efficiently as the demand grows.
- The system is designed for batch processing; real-time updates may have a slight delay.
- Full-text search capabilities depend on the PostgreSQL configuration.
- Implement Elasticsearch for advanced, scalable search capabilities and log analytics.
- Introduce Redis-based rate limiting to manage API usage and ensure system stability.
- Utilize Server-Sent Events (SSE) for real-time progress updates on bulk operations.
These enhancements focus on improving search functionality, maintaining system integrity, and enhancing user interactivity.
- Method Development: Implement a mechanism to detect and omit duplicate entities, primarily based on the 'email' field, to prevent redundant processing.
- Logging Skipped Entries: Ensure that all entities skipped due to duplication are recorded in the logs with a clear indication of being omitted.
- Add fields to your BulkAction entity to store the scheduled start time of the action.
- Implement a scheduling mechanism using cron jobs or a similar scheduling library that can trigger the execution of a bulk action at the specified time.
- Store the scheduled time in the database and use a background process to check for actions that need to be started. we had discussed this in our interview round 1
These sections will focus on enhancing data integrity and user experience by ensuring efficient data handling and providing more control over when bulk actions are executed.