unstructured-data
There are 129 repositories under unstructured-data topic.
voxel51/fiftyone
The open-source tool for building high-quality datasets and computer vision models
towhee-io/towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
instill-ai/instill-core
🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications
milvus-io/bootcamp
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
nomic-ai/nomic
Interact, analyze and structure massive text, image, embedding, audio and video datasets
Renumics/spotlight
Interactively explore unstructured datasets from your dataframe.
lilacai/lilac
Curate better data for LLMs
dingodb/dingo
A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.
nuclia/nucliadb
NucliaDB, The AI Search database for RAG
EulerSearch/embedding_studio
Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.
garyelephant/pygrok
python implementation of jordansissel's grok regular expression library
automorphic-ai/trex
Enforce structured output from LLMs 100% of the time
Zipstack/unstract
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
RelevanceAI/relevanceai
Home of the AI workforce - Multi-agent system, AI agents & tools
jostmey/dkm
Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features
BartJongejan/Bracmat
Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.
IBM/pixiedust-facebook-analysis
A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio
adansons/base
Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance.
instill-ai/console
📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core
chaitjo/knowledge-graphs
Building Knowledge Graphs from Unstructured Text
instill-ai/cli
⌨️ Instill CLI for 🔮 Instill Core: https://github.com/instill-ai/instill-core
amphi-ai/amphi-etl
Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.
instill-ai/deprecated-model
⚗️ Instill Model contains components for AI model orchestration
instill-ai/pipeline-backend
⇋ A REST/gRPC server for Instill VDP API service
TuanaCelik/unstructuredio-haystack
💙 Unstructured Data Connectors for Haystack 2.0
instill-ai/model-backend
⇋ A REST/gRPC server for Instill Model API service
jokruger/rl3examples
RL3 examples repository (information extraction, NER, NLP, web & text mining, etc).
IBM/generate-insights-from-data-formats-with-watson
How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.
instill-ai/deprecated-core
🔮 Instill Core contains components for supporting Instill VDP and Instill Model
nicbet/infozilla
The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.
floriancochard/extract-data-from-paper
Extract tabular information from scanned documents (PDF to CSV)
mkearney/wibble
Web Data Frames
SachinKalsi/html_tag_annotator
A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension
aclai-lab/SoleData.jl
Manage unstructured and multimodal datasets!
Zipstack/unstract-adapters
Unstract's interface to LLMs, Embeddings and VectorDBs.