chunking
There are 289 repositories under chunking topic.
jiesutd/NCRFpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
systemd/casync
Content-Addressable Data Synchronization Tool
smooks/smooks
An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration
mirth/chonky
Fully neural approach for text chunking
isaacus-dev/semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
folbricht/desync
Alternative casync implementation
microsoft/rag-experiment-accelerator
The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.
lazyFrogLOL/llmdocparser
A package for parsing PDFs and analyzing their content using LLMs.
26hzhang/neural_sequence_labeling
A TensorFlow implementation of Neural Sequence Labeling model, which is able to tackle sequence labeling tasks such as POS Tagging, Chunking, NER, Punctuation Restoration and etc.
zeroentropy-ai/zchunk
A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.
jparkerweb/semantic-chunking
🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows
swarmauri/swarmauri-sdk
a modular multimodal framework for ai applications
jordicenzano/go-ts-segmenter
Live TS segmenter and HLS manifest creation in Go
safakatakancelik/TalkWithYourFiles
An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.
neondatabase-labs/pgrag
Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines
xtabbas/The-Ultimate-Boilerplate
webpack 2, react hotloader 3, react router v4, code splitting and more
esastack/esa-restclient
An asynchronous event-driven HTTP client based on netty.
Sammyjo20/laravel-chunkable-jobs
📑 Split Laravel jobs into multiple separate job chunks
Koziev/GrammarEngine
Грамматический Словарь Русского Языка (+ английский, японский, etc)
ronomon/deduplication
Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.
drmingler/smart-llm-loader
smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.
bnosac/crfsuite
Labelling Sequential Data in Natural Language Processing with R - using CRFsuite
iscc/fastcdc-py
FastCDC implementation in Python https://pypi.org/project/fastcdc/
ALucek/chunking-strategies
An Overview of the Latest Document Chunking Research
DanEngelbrecht/longtail
Incremental asset delivery library
howardyclo/grammar-pattern
Extract and align grammar patterns from English sentences.
DS4SD/quackling
Build document-native LLM applications
dcarpintero/llamaindexchat
LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex. It demonstrates how to impl. chunking, indexing, and source citation.
zoner72/Datavizion-RAG
Retrieval-augmented generation (RAG) for remote & local LLM use
carlosplanchon/betterhtmlchunking
BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.
duriantaco/pykomodo
A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.
DocumentAtom/DocumentAtom
DocumentAtom provides a light, fast library for breaking input documents into constituent parts (atoms), useful for text processing, analysis, and artificial intelligence.
DanEngelbrecht/golongtail
Command line front end for longtail synchronization tool
speedyk-005/chunklet-py
Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.
Alkl58/NotEnoughAV1Encodes-Qt
Linux GUI for AV1 Encoders
BenVlodgi/UE-DynamicOctree
Unreal Engine Plugin providing easy to use Octree.