Image-Chat, developed for the Smart India Hackathon 2024 under Bharat Electronics Limited (BEL), is a Conversational Image Recognition Chatbot. The chatbot recognizes objects in images uploaded by users and generates relevant responses to user queries about the images. This project integrates Natural Language Processing (NLP) and Computer Vision to deliver a seamless conversational experience that combines image recognition with language understanding.
- Problem Statement ID: 1604
- Title: Conversational Image Recognition Chatbot
- Category: Software
- Theme: Smart Automation
- Organization: Bharat Electronics Limited (BEL)
Image-Chat is a deep learning-based chatbot that recognizes objects in images and provides accurate, context-aware responses to user queries. It merges image recognition with NLP to facilitate meaningful conversations based on the visual content provided by users.
- Ensemble Models: For enhanced image recognition accuracy.
- Advanced NLP: Ensures contextually relevant dialogue generation.
- Human-in-the-loop Feedback: For continuous model learning and improvement.
- Explainability Techniques: Includes Grad-CAM to enhance transparency in model predictions.
- Programming Languages: Python
- Frameworks: TensorFlow, PyTorch, Hugging Face Transformers, Streamlit, LangChain
- Libraries: OpenCV, NumPy, Pandas, Optuna for hyperparameter tuning
- Tools: Grad-CAM for model explainability, Mixup and CutMix for data augmentation
The user interface for Image-Chat has been implemented using Streamlit, allowing users to upload images and interact with the chatbot.
- Data Collection & Preprocessing: Image and text data are collected and preprocessed for model training.
- Model Development: Ensemble models are trained for image recognition, and NLP models are developed for generating responses.
- Integration: Models are integrated using LangChain for a seamless interaction.
- Deployment: The application is deployed on a web platform using Streamlit.
- Technical Feasibility: The project leverages pre-trained models and transfer learning, ensuring feasibility with available resources.
- Challenges & Risks: Ensuring accurate and contextually relevant responses; potential degradation in model performance with complex images.
- Mitigation Strategies: Continuous fine-tuning with human-in-the-loop feedback and data augmentation to improve model robustness.
- Target Audience: Tech companies, educational platforms, customer service industries.
- Social Impact: Enhances user experience by integrating conversational AI with visual understanding.
- Economic Impact: Reduces the need for manual image analysis in various applications.
- Datasets: COCO, ImageNet, Pascal
- Tools: TensorFlow, PyTorch, Hugging Face Transformers
- Papers: [Relevant academic papers on image recognition and NLP]