This repository containts code for data collection which is required to train our embeding based image search engine.
The CI-CD pipeline is implemented while devpoling this project.
- On push checkout the code and create docker container on git-hub server.
- Push the image to Ecr with production tag
- Once action push is completed pull and run the image on Ec2 instance.
Here is the screenshot of the route.
- /fetch : To get labels currently present in the database. Important to call as it updates in memory database.
- /Single_upload : This Api Should be used to upload single image to s3 bucket
- /bulk_upload : This Api should be used to upload bulk images to s3 bucket
- /add_label : This api should be ued to add new label in s3 bucket.
- S3 Bucket
- Mongo Database
- Elastic Container Registry
- Elastic Compute Cloud
- Download dataset: Dataset link(https://www.kaggle.com/datasets/imbikramsaha/caltech-101)
- Put archive.zip in data folder.
- run s3_setup.py
- run mongo_setup.py
export ATLAS_CLUSTER_USERNAME=<username>
export ATLAS_CLUSTER_PASSWORD=<password>
export AWS_ACCESS_KEY_ID=<AWS_ACCESS_KEY_ID>
export AWS_SECRET_ACCESS_KEY=<AWS_SECRET_ACCESS_KEY>
export AWS_REGION=<region>
export AWS_BUCKET_NAME=<AWS_BUCKET_NAME>
export AWS_ECR_LOGIN_URI=<AWS_ECR_LOGIN_URI>
export ECR_REPOSITORY_NAME=<name>
export DATABASE_NAME=<name>
- Clone the code
- Create a virtual environment
conda create -n new_env python=3.9 -y
- Activate the environment
conda activate new_env
-
run the s3_setup, mongo_setup.py
-
run app.py