chunking

There are 289 repositories under chunking topic.

  • jiesutd/NCRFpp

    NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

    Language:Python1.9k58172444
  • systemd/casync

    Content-Addressable Data Synchronization Tool

    Language:C1.5k81109119
  • smooks/smooks

    An extensible Java framework for building event-driven applications that break up XML and non-XML data into chunks for data integration

    Language:Java4124289360
  • mirth/chonky

    Fully neural approach for text chunking

    Language:Python370
  • isaacus-dev/semchunk

    A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

    Language:Python36831219
  • folbricht/desync

    Alternative casync implementation

    Language:Go3581512647
  • microsoft/rag-experiment-accelerator

    The RAG Experiment Accelerator is a versatile tool designed to expedite and facilitate the process of conducting experiments and evaluations using Azure Cognitive Search and RAG pattern.

    Language:Python2692624695
  • lazyFrogLOL/llmdocparser

    A package for parsing PDFs and analyzing their content using LLMs.

    Language:Python267338
  • 26hzhang/neural_sequence_labeling

    A TensorFlow implementation of Neural Sequence Labeling model, which is able to tackle sequence labeling tasks such as POS Tagging, Chunking, NER, Punctuation Restoration and etc.

    Language:Python23371546
  • zeroentropy-ai/zchunk

    A new chunking strategy developed by ZeroEntropy for general semantic chunking using Llama-70B.

    Language:Python2111211
  • jparkerweb/semantic-chunking

    🍱 semantic-chunking ⇢ semantically create chunks from large document for passing to LLM workflows

    Language:JavaScript11121011
  • swarmauri/swarmauri-sdk

    a modular multimodal framework for ai applications

    Language:Python98654144
  • jordicenzano/go-ts-segmenter

    Live TS segmenter and HLS manifest creation in Go

    Language:Go948313
  • safakatakancelik/TalkWithYourFiles

    An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.

    Language:Python944112
  • neondatabase-labs/pgrag

    Postgres extensions to support end-to-end Retrieval-Augmented Generation (RAG) pipelines

    Language:Rust85113
  • xtabbas/The-Ultimate-Boilerplate

    webpack 2, react hotloader 3, react router v4, code splitting and more

    Language:JavaScript85508
  • esastack/esa-restclient

    An asynchronous event-driven HTTP client based on netty.

    Language:Java8423423
  • Sammyjo20/laravel-chunkable-jobs

    📑 Split Laravel jobs into multiple separate job chunks

    Language:PHP84324
  • Koziev/GrammarEngine

    Грамматический Словарь Русского Языка (+ английский, японский, etc)

    Language:C++7591821
  • ronomon/deduplication

    Fast multi-threaded content-dependent chunking deduplication for Buffers in C++ with a reference implementation in Javascript. Ships with extensive tests, a fuzz test and a benchmark.

    Language:JavaScript75459
  • drmingler/smart-llm-loader

    smart-llm-loader is a lightweight yet powerful Python package that transforms any document into LLM-ready chunks. Spend less time on preprocessing headaches and more time building what matters. From RAG systems to chatbots to document Q&A, SmartLLMLoader handles the heavy lifting so you can focus on creating exceptional AI applications.

    Language:Python71102
  • bnosac/crfsuite

    Labelling Sequential Data in Natural Language Processing with R - using CRFsuite

    Language:C6472011
  • iscc/fastcdc-py

    FastCDC implementation in Python https://pypi.org/project/fastcdc/

    Language:Python6131517
  • ALucek/chunking-strategies

    An Overview of the Latest Document Chunking Research

    Language:Jupyter Notebook58207
  • longtail

    DanEngelbrecht/longtail

    Incremental asset delivery library

    Language:C576179
  • howardyclo/grammar-pattern

    Extract and align grammar patterns from English sentences.

    Language:Python55529
  • DS4SD/quackling

    Build document-native LLM applications

    Language:Python54312
  • dcarpintero/llamaindexchat

    LLM Chatbot w/ Retrieval Augmented Generation using Llamaindex. It demonstrates how to impl. chunking, indexing, and source citation.

    Language:Python45216
  • zoner72/Datavizion-RAG

    Retrieval-augmented generation (RAG) for remote & local LLM use

    Language:Python45
  • carlosplanchon/betterhtmlchunking

    BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.

    Language:Python44315
  • duriantaco/pykomodo

    A Python-based parallel file chunking system designed for processing large codebases into LLM-friendly chunks.

    Language:Python4410
  • DocumentAtom/DocumentAtom

    DocumentAtom provides a light, fast library for breaking input documents into constituent parts (atoms), useful for text processing, analysis, and artificial intelligence.

    Language:C#38205
  • DanEngelbrecht/golongtail

    Command line front end for longtail synchronization tool

    Language:Go3544410
  • speedyk-005/chunklet-py

    Chunk smarter, not harder — built for LLMs, RAG pipelines, and beyond.

    Language:Python34
  • Alkl58/NotEnoughAV1Encodes-Qt

    Linux GUI for AV1 Encoders

    Language:Python30213
  • UE-DynamicOctree

    BenVlodgi/UE-DynamicOctree

    Unreal Engine Plugin providing easy to use Octree.

    Language:C++30205