This project is a Streamlit web application that utilizes Google's Generative AI to extract information from invoice images based on user-provided prompts. It demonstrates the use of generative models for understanding and processing invoice data.
- Image Upload: Users can upload invoice images in JPG, PNG, or JPEG format.
- Prompt Input: Users can provide prompts in text format to guide the extraction process.
- Generative AI Integration: Utilizes Google's Generative AI to analyze images and generate responses.
- Multi-Language Support: Capable of providing responses in multiple languages based on the input prompt.
To run this project locally, follow these steps:
-
Clone the repository:
git clone https://github.com/your_username/multilanguage-invoice-extractor.git cd multilanguage-invoice-extractor
-
Set up Python environment (recommended using Anaconda):
conda create --name invoice python=3.8 conda activate invoice
-
Install dependencies:
pip install -r requirements.txt
-
Set up Google API Key:
-
Obtain a Google API Key with access to Generative AI (Gemini).
-
Create a
.env
file in the project root directory. -
Add your API key to the
.env
file:GOOGLE_API_KEY="your_api_key_here"
-
-
Run the Streamlit application:
streamlit run app.py
-
Access the application in your browser at
http://localhost:8501
.
- Upload an invoice image using the file uploader.
- Enter a prompt that describes what information you want from the invoice.
- Click the "Tell me about the invoice" button to generate and display the response.
- The response will be shown below the image, indicating the extracted information.
Here are some example prompts you can use:
You are an expert in understanding invoices.
You are provided with an image of an invoice and a prompt.
Extract the relevant information from the invoice image and generate a response that is relevant to the given prompt.
Contributions are welcome! If you find any issues or have suggestions for improvements, please feel free to open an issue or create a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.