Visual Language Processing (VLP) is at the forefront of generative AI, driving advancements in multimodal learning that encompasses language intelligence, vision understanding and processing. Combined with Large Language Model (LLM) and Contrastive Language-Image Pre-Training (CLIP) trained with large quantity of multimodality data, Visual Language Models (VLMs) are particularly adept at tasks like image captioning, object detection/segmentation, and visual question-answering. Their use cases span various domains, from media entertainment to medical diagnostics and quality assurance in manufacturing.
Dialogue Guided Intelligent Document Processing (DGIDP) is an innovative approach to extracting and processing information from documents by leveraging natural language understanding and conversational AI. This technique allows users to interact with the IDP system using human-like conversations, asking questions, and receiving relevant information in real-time. The system is designed to understand context, process unstructured data, and respond to user queries effectively and efficiently.
While the text or voice chat accepts all major languages, the document upload feature only accepts files in English, German, French, Spanish, Italian, and Portuguese. The demo supports multilingual text and voice input, as well as multi-page documents in PDF, PNG, JPG, or TIFF format.
To use SageMaker Jumpstart foundation model for text generation, use the notebook to deploy an endpoint and test.
To use third-party APIs such as OpenAI APIs and SERP APIs, you might risk sharing your private information with third-party API providers. Be caucious of your senstive information.
This code has been tested on EC2 server with al2023-ami-2023.0.20230503.0-kernel-6.1-x86_64 AMI type. You will need to configure you security group to allow inbound traffic to port 7862.
In your server, locate env_var_conf file, save your tokens in the file, and excute the following command.
source ./env_var_conf
python demo.py
Go to https://your-server-ip:7862 and choose either dgidp or babyagi to test dialogue IDP. You will need to first upload a document, click Transcribe button, and type your question to Textbox and then click Text Chat button.