text-chunking

There are 18 repositories under text-chunking topic.

isaacus-dev/semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
Language:Python368 3 1219
lazyFrogLOL/llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.
Language:Python267 3 38
jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
Language:JavaScript111 2 1011
drittich/SemanticSlicer
🧠✂️ SemanticSlicer — A smart text chunker for LLM-ready documents.
Language:C#28 2 11
GregorBiswanger/SemanticChunker.NET
Embedding-driven, context-aware text chunking for Semantic Kernel and RAG workflows in .NET
Language:C#16
ChenTaHung/HTML-Text-Parser
This project is designed to extract text from documents and prepare it for processing by Large Language Models (LLM). Implemented a feature to store and utilize text style information, enabling the program to identify and segment content based on potential headers and titles.
Language:HTML10 1 01
smart-models/Sentences-Chunker
Cutting-edge tool designed to intelligently segment text documents into optimally-sized chunks
Language:Python6
philnash/chunkers
An exploration of text splitting and chunking in JavaScript
Language:TypeScript3 1 0
betcorg/llm-text-splitter
A lightweight TypeScript text splitter for RAG applications
Language:TypeScript20
Besthope-Official/predoc
Preprocess document service for RAG (Retriveal Augumented Generation)
Language:Python1
ushakiranmai/text_summarization
This Text Summarization Tool uses advanced machine learning models to create concise, meaningful summaries of lengthy texts. Built with Hugging Face Transformers and Gradio, it efficiently handles various input lengths, ideal for summarizing articles, reports, and more
Language:Python1 1 00
adityapathak-cubastion/cubastion-hr-chatbot
Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>
Language:Python01
adityapathakk/cubastion-hr-chatbot
Presenting, Cubastion's HR chatbot - it can answer queries based on all the latest HR documents published by Cubastion's HR team. This conveniently saves time, allowing a Cubastion employee to resolve their query without having to comb through the actual documents. <<Developed with Python, sentence-transformers, Pinecone, llama3.2, and Streamlit>>
Language:Python0 0 00
andrewschenck/ragl
Vector Storage and Retrieval for RAG
Language:Python
DavidShableski/llm-pdf-analyzer
Self-hosted RAG application for PDF question-answering using LangChain, ChromaDB, and Ollama. Features Flask web interface, vector embeddings, automated chunking, and local LLM inference. Includes CI/CD pipeline with automated testing.
Language:Python
mohsinraza2999/Legal-Advisor-using-gpt-neo-1.3B
This project aims to build an AI-powered Legal Advisor that leverages natural language processing and vector search technology to provide users with legal guidance based on authoritative legal texts.
Language:Jupyter Notebook
samay-jain/Retrieval-Augmented-Generation-RAG-simple-program
A lightweight, modular Retrieval-Augmented Generation (RAG) system built with Streamlit, FAISS, and LLMs like OpenAI and Ollama. Upload documents, embed them, and ask intelligent questions with real-time context-aware responses.
Language:Python
Vivet-Software/Vivet.AI
A service-oriented .NET library for AI with interchangeable orchestrations and vector stores.

text-chunking

isaacus-dev/semchunk

lazyFrogLOL/llmdocparser

jparkerweb/semantic-chunking

drittich/SemanticSlicer

GregorBiswanger/SemanticChunker.NET

ChenTaHung/HTML-Text-Parser

smart-models/Sentences-Chunker

philnash/chunkers

betcorg/llm-text-splitter

Besthope-Official/predoc

ushakiranmai/text_summarization

adityapathak-cubastion/cubastion-hr-chatbot

adityapathakk/cubastion-hr-chatbot

andrewschenck/ragl

DavidShableski/llm-pdf-analyzer

mohsinraza2999/Legal-Advisor-using-gpt-neo-1.3B

samay-jain/Retrieval-Augmented-Generation-RAG-simple-program

Vivet-Software/Vivet.AI