This project implements an advanced generative AI pipeline for extracting and rating features from images. It combines the power of Florence-2, a state-of-the-art vision-language model, with a fine-tuned version of Mistral-v3, a cutting-edge large language model.
- Utilizes Florence-2 to generate detailed image descriptions
- Employs a custom fine-tuned Mistral-v3 model for feature extraction and scoring
- Outputs results in JSON format for easy integration with other applications
- Model: Florence-2
- Input: Raw image data
- Output: Comprehensive text description of the image
- Model: Fine-tuned Mistral-v3
- Input: Text description from Florence-2
- Output: JSON object containing extracted features and their scores
To run this project, follow these steps:
- Download the dataset
- Run the image scraping script:
python scrapeImages.py
This will scrape images from Airbnb based on the dataset.
- Open and run the
GenAI_Approach.ipynb
Jupyter notebook. Follow the instructions within the notebook to process the images and extract features.
Important note: Ensure you use the same file and directory names as specified in the scripts, or modify the paths in the code to match your directory structure.