Pinned Repositories
Build-Docker-for-LlamaIndex-Agentic-RAG-System
Docker implementation of Llama Index Agentic RAG. Developing a RAG system requires multiple component such as LLM, Vector-DB, UI, etc. In this work we perform containerization of entire system.
Embedding-Quantization
To make LLM faster we need faster retrieval system. Here comes Embedding Quantization. Embedding quantization is great technique to save cost on Vector DB, significantly faster retrieval while preserving retrieval performance.
Fine-tuning-BART
Fine Tuning is a cost-efficient way of preparing a model for specialized tasks. Fine-tuning reduces required training time as well as training datasets. We have open-source pre-trained models. Hence, we do not need to perform full training every time we create a model.
Intro-to-RAG-with-CODEGEMMA-7B
LLM is a very powerful tool. It often performs more than required (hallucinations) and may tend to generate output in a pattern it finds best. We need RAG to harness the power of LLM in a controlled manner. In this work we implement a simple RAG system with Codegemma and an in-memory Vector Database.
Llama-2-7B-Chat-PEFT
PEFT is a wonderful tool that enables training a very large model in a low resource environment. Quantization and PEFT will enable widespread adoption of LLM.
LlamaIndex-Agent
A RAG system is just the beginning of harnessing the power of LLM. The next step is creating an intelligent Agent. In Agentic RAG the Agent makes use of available tools, strategies and LLM to generate response in a specialized way. Unlike a simple RAG, an Agent can dynamically choose between tools, routing strategy, etc.
LlamaIndex-Agent-with-Reasoning-Loop
Simple agents are good for 1-to-1 retrieval system. For more complex task we need multi steps reasoning loop. In a reasoning loop the agent can break down a complex task into subtasks and solve them step by step while maintaining a conversational memory.
Meta-Llama3-8B-Chat-Instruct-LoRA
PEFT (LoRA) with Meta-Llama3-8B-Chat-Instruct
Phi3-No-GPU-No-Worry
GPU constrained! No More. Microsoft released Phi3 specially designed for memory/compute constrained environments. The model support ONXX CPU runtime which offers amazing inference speed even on mobile cpu.
Vector_Database
Implementing Vector Database on CoNaLa dataset to retrieve program snippets relevant to user queries. This is a very simple simulation of a Vector Database.
swastikmaiti's Repositories
swastikmaiti/LlamaIndex-Agent
A RAG system is just the beginning of harnessing the power of LLM. The next step is creating an intelligent Agent. In Agentic RAG the Agent makes use of available tools, strategies and LLM to generate response in a specialized way. Unlike a simple RAG, an Agent can dynamically choose between tools, routing strategy, etc.
swastikmaiti/Embedding-Quantization
To make LLM faster we need faster retrieval system. Here comes Embedding Quantization. Embedding quantization is great technique to save cost on Vector DB, significantly faster retrieval while preserving retrieval performance.
swastikmaiti/Llama-2-7B-Chat-PEFT
PEFT is a wonderful tool that enables training a very large model in a low resource environment. Quantization and PEFT will enable widespread adoption of LLM.
swastikmaiti/Intro-to-RAG-with-CODEGEMMA-7B
LLM is a very powerful tool. It often performs more than required (hallucinations) and may tend to generate output in a pattern it finds best. We need RAG to harness the power of LLM in a controlled manner. In this work we implement a simple RAG system with Codegemma and an in-memory Vector Database.
swastikmaiti/Phi3-No-GPU-No-Worry
GPU constrained! No More. Microsoft released Phi3 specially designed for memory/compute constrained environments. The model support ONXX CPU runtime which offers amazing inference speed even on mobile cpu.
swastikmaiti/Build-Docker-for-LlamaIndex-Agentic-RAG-System
Docker implementation of Llama Index Agentic RAG. Developing a RAG system requires multiple component such as LLM, Vector-DB, UI, etc. In this work we perform containerization of entire system.
swastikmaiti/Fine-tuning-BART
Fine Tuning is a cost-efficient way of preparing a model for specialized tasks. Fine-tuning reduces required training time as well as training datasets. We have open-source pre-trained models. Hence, we do not need to perform full training every time we create a model.
swastikmaiti/LlamaIndex-Agent-with-Reasoning-Loop
Simple agents are good for 1-to-1 retrieval system. For more complex task we need multi steps reasoning loop. In a reasoning loop the agent can break down a complex task into subtasks and solve them step by step while maintaining a conversational memory.
swastikmaiti/Meta-Llama3-8B-Chat-Instruct-LoRA
PEFT (LoRA) with Meta-Llama3-8B-Chat-Instruct
swastikmaiti/Vector_Database
Implementing Vector Database on CoNaLa dataset to retrieve program snippets relevant to user queries. This is a very simple simulation of a Vector Database.
swastikmaiti/ThesisWork
This repository contains code on Thesis Work : Encoder Training for Neural Machine Translation in Resource Constrained Settings