unstructured-data
There are 182 repositories under unstructured-data topic.
iterative/dvc
🦉 Data Versioning and ML Experiments
voxel51/fiftyone
Refine high-quality datasets and visual AI models
Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
neo4j-labs/llm-graph-builder
Neo4j graph construction from unstructured data using LLMs
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
instill-ai/instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
milvus-io/bootcamp
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
nomic-ai/nomic
Interact, analyze and structure massive text, image, embedding, audio and video datasets
dingodb/dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
lotus-data/lotus
Use LOTUS to process all of your datasets with LLMs and embeddings. Enjoy up to 1000x speedups with fast, accurate query processing, that's as simple as writing Pandas code
yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
Renumics/spotlight
Interactively explore unstructured datasets from your dataframe.
amphi-ai/amphi-etl
Visual Data Preparation and Transformation. Low-Code Python-based ETL.
databricks/lilac
Curate better data for LLMs
JSv4/OpenContracts
Enterprise-grade and API-first LLM workspace for unstructured documents, including data extraction, redaction, rights management, prompt playground, and more!
nuclia/nucliadb
NucliaDB, The AI Search database for RAG
EulerSearch/embedding_studio
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
harishdeivanayagam/rowfill
Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers
graphlit/graphlit-mcp-server
Model Context Protocol (MCP) Server for Graphlit Platform
garyelephant/pygrok
python implementation of jordansissel's grok regular expression library
fzliu/radient
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
automorphic-ai/trex
Enforce structured output from LLMs 100% of the time
RelevanceAI/relevanceai
Home of the AI workforce - Multi-agent system, AI agents & tools
velocitybolt/open-extract
Structured Data Extractor for AI Agents. Search your documents or the web for specific data and get it back in JSON or Markdown in a single tool call.
DerwenAI/strwythura
Construct knowledge graphs from unstructured data sources, use graph algorithms for enhanced GraphRAG with a DSPy-based chat bot locally, and curate semantics for optimizing AI app outcomes within a specific domain.
wangxb96/RAG-QA-Generator
RAG-QA-Generator 是一个用于检索增强生成(RAG)系统的自动化知识库构建与管理工具。该工具通过读取文档数据,利用大规模语言模型生成高质量的问答对(QA对),并将这些数据插入数据库中,实现RAG系统知识库的自动化构建和管理。
mitdbg/palimpzest
A System for Optimized Semantic Computation
CambioML/any-parser
Accurate, private and configurable document retrieval LLM
jostmey/dkm
Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features
BartJongejan/Bracmat
Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.
IBM/pixiedust-facebook-analysis
A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio
ScrapeGraphAI/Scrapontologies
Python library for Entities, relationships and schemas extraction from documents
instill-ai/console
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
adansons/base
Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance.
instill-ai/pipeline-backend
⇋ A REST/gRPC server for Instill VDP API service