Welcome to our hackathon project! This repository contains the server-side implementation of an API hosted on a Kaggle environment using Flask and Ngrok. The API leverages state-of-the-art models, including fine-tuned BERT and a quantized 4-bit FinLLaMA model, trained on a generated dataset.
- Introduction
- Features
- Technologies Used
- Models
- Setup and Installation
- API Endpoints
- Usage
- Contributions
- License
This project is designed to demonstrate how cutting-edge machine learning models can be deployed on lightweight servers for real-time inference. It showcases the use of BERT (Bidirectional Encoder Representations from Transformers) for natural language understanding tasks and FinLLaMA, a finetuned LLaMA model, optimized with 4-bit quantization for reduced computational overhead.
The API was developed as part of a hackathon challenge and is hosted using Ngrok, allowing easy public access to the Flask-based server.
- Fine-tuned BERT for text classification and natural language processing tasks.
- Quantized 4-bit FinLLaMA for efficient performance on a smaller compute footprint.
- Lightweight and portable deployment using Flask.
- Public API hosting enabled with Ngrok.
- Fully functional demonstration on Kaggle.
- Python: Core programming language.
- Flask: Lightweight web framework for API creation.
- Ngrok: Tunnel service to expose the server to a public endpoint.
- Kaggle: Environment for model deployment.
- Transformers Library: For pre-trained and fine-tuned models.
- Quantization Techniques: For optimizing model inference.
- Base Model: bert-base-uncased
- Task: Text classification
- Dataset: Generated synthetic dataset tailored for the hackathon task.
- Performance: Achieved high accuracy on validation datasets.
- Base Model: LLaMA
- Optimization: Quantized to 4-bit for efficiency.
- Task: Multi-label text processing and summarization.
- Dataset: Augmented data specifically designed for fine-tuning.