unstructured-data
There are 156 repositories under unstructured-data topic.
iterative/dvc
🦉 Data Versioning and ML Experiments
voxel51/fiftyone
Refine high-quality datasets and visual AI models
Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
neo4j-labs/llm-graph-builder
Neo4j graph construction from unstructured data using LLMs
instill-ai/instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
milvus-io/bootcamp
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
dingodb/dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
nomic-ai/nomic
Interact, analyze and structure massive text, image, embedding, audio and video datasets
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Renumics/spotlight
Interactively explore unstructured datasets from your dataframe.
databricks/lilac
Curate better data for LLMs
amphi-ai/amphi-etl
Visual Data Transformation and Data Preparation. Low-Code Python-based ETL.
nuclia/nucliadb
NucliaDB, The AI Search database for RAG
yobix-ai/extractous
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
EulerSearch/embedding_studio
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
garyelephant/pygrok
python implementation of jordansissel's grok regular expression library
fzliu/radient
Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.
automorphic-ai/trex
Enforce structured output from LLMs 100% of the time
RelevanceAI/relevanceai
Home of the AI workforce - Multi-agent system, AI agents & tools
marly-ai/marly
Context-aware structured outputs. Search your documents or the web for specific data and get it back in JSON or Markdown.
CambioML/any-parser
Accurate, private and configurable document retrieval LLM
DerwenAI/strwythura
How to construct knowledge graphs from unstructured data sources
jostmey/dkm
Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features
wangxb96/RAG-QA-Generator
RAG-QA-Generator 是一个用于检索增强生成(RAG)系统的自动化知识库构建与管理工具。该工具通过读取文档数据,利用大规模语言模型生成高质量的问答对(QA对),并将这些数据插入数据库中,实现RAG系统知识库的自动化构建和管理。
BartJongejan/Bracmat
Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.
IBM/pixiedust-facebook-analysis
A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio
instill-ai/console
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
ScrapeGraphAI/Scrapontologies
Python library for Entities, relationships and schemas extraction from documents
adansons/base
Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance.
chaitjo/knowledge-graphs
Building Knowledge Graphs from Unstructured Text
instill-ai/pipeline-backend
⇋ A REST/gRPC server for Instill VDP API service
instill-ai/cli
⌨️ Instill CLI for 🔮 Instill Core: https://github.com/instill-ai/instill-core
instill-ai/deprecated-model
⚗️ Instill Model contains components for AI model orchestration
Zipstack/unstract-adapters
Unstract's interface to LLMs, Embeddings and VectorDBs.
osllmai/inDox
Indox is an advanced search and retrieval technique that efficiently extracts data from diverse document types, including PDFs and HTML, using online or offline large language models such as Openai, Hugging Face , etc.