Image-Chat

Overview

Image-Chat, developed for the Smart India Hackathon 2024 under Bharat Electronics Limited (BEL), is a Conversational Image Recognition Chatbot. The chatbot recognizes objects in images uploaded by users and generates relevant responses to user queries about the images. This project integrates Natural Language Processing (NLP) and Computer Vision to deliver a seamless conversational experience that combines image recognition with language understanding.

Problem Statement

Problem Statement ID: 1604
Title: Conversational Image Recognition Chatbot
Category: Software
Theme: Smart Automation
Organization: Bharat Electronics Limited (BEL)

Proposed Solution

Image-Chat is a deep learning-based chatbot that recognizes objects in images and provides accurate, context-aware responses to user queries. It merges image recognition with NLP to facilitate meaningful conversations based on the visual content provided by users.

Key Features

Ensemble Models: For enhanced image recognition accuracy.
Advanced NLP: Ensures contextually relevant dialogue generation.
Human-in-the-loop Feedback: For continuous model learning and improvement.
Explainability Techniques: Includes Grad-CAM to enhance transparency in model predictions.

Technologies Used

Programming Languages: Python
Frameworks: TensorFlow, PyTorch, Hugging Face Transformers, Streamlit, LangChain
Libraries: OpenCV, NumPy, Pandas, Optuna for hyperparameter tuning
Tools: Grad-CAM for model explainability, Mixup and CutMix for data augmentation

Implementation

UI Development

The user interface for Image-Chat has been implemented using Streamlit, allowing users to upload images and interact with the chatbot.

Methodology

Data Collection & Preprocessing: Image and text data are collected and preprocessed for model training.
Model Development: Ensemble models are trained for image recognition, and NLP models are developed for generating responses.
Integration: Models are integrated using LangChain for a seamless interaction.
Deployment: The application is deployed on a web platform using Streamlit.

Feasibility and Viability

Technical Feasibility: The project leverages pre-trained models and transfer learning, ensuring feasibility with available resources.
Challenges & Risks: Ensuring accurate and contextually relevant responses; potential degradation in model performance with complex images.
Mitigation Strategies: Continuous fine-tuning with human-in-the-loop feedback and data augmentation to improve model robustness.

Impact and Benefits

Target Audience: Tech companies, educational platforms, customer service industries.
Social Impact: Enhances user experience by integrating conversational AI with visual understanding.
Economic Impact: Reduces the need for manual image analysis in various applications.

Research and References

Datasets: COCO, ImageNet, Pascal
Tools: TensorFlow, PyTorch, Hugging Face Transformers
Papers: [Relevant academic papers on image recognition and NLP]

dhanushperumalla/Image-Chat-SIH-HACKATHON-