unstructured-data

There are 129 repositories under unstructured-data topic.

  • fiftyone

    voxel51/fiftyone

    The open-source tool for building high-quality datasets and computer vision models

    Language:Python6.9k531.5k511
  • towhee-io/towhee

    Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

    Language:Python3k29654240
  • instill-core

    instill-ai/instill-core

    🔮 Instill Core is a full-stack AI infrastructure tool for data, model and pipeline orchestration, designed to streamline every aspect of building versatile AI-first applications

    Language:Makefile1.9k2949282
  • milvus-io/bootcamp

    Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.

    Language:HTML1.7k32260540
  • tstanislawek/awesome-document-understanding

    A curated list of resources for Document Understanding (DU) topic

  • nomic-ai/nomic

    Interact, analyze and structure massive text, image, embedding, audio and video datasets

    Language:Python1k2456144
  • Renumics/spotlight

    Interactively explore unstructured datasets from your dataframe.

    Language:TypeScript1k188682
  • lilac

    lilacai/lilac

    Curate better data for LLMs

    Language:Python8691329179
  • dingodb/dingo

    A multi-modal vector database that supports upserts and vector queries using unified SQL (MySQL-Compatible) on structured and unstructured data, while meeting the requirements of high concurrency and ultra-low latency.

    Language:Java6696674159
  • nucliadb

    nuclia/nucliadb

    NucliaDB, The AI Search database for RAG

    Language:Python59019946
  • embedding_studio

    EulerSearch/embedding_studio

    Embedding Studio is a framework which allows you transform your Vector Database into a feature-rich Search Engine.

    Language:Python370665
  • garyelephant/pygrok

    python implementation of jordansissel's grok regular expression library

    Language:Python275163276
  • automorphic-ai/trex

    Enforce structured output from LLMs 100% of the time

    Language:Python239309
  • unstract

    Zipstack/unstract

    No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

    Language:Python1823318
  • RelevanceAI/relevanceai

    Home of the AI workforce - Multi-agent system, AI agents & tools

    Language:Python10510819
  • jostmey/dkm

    Dynamic Kernel Matching (DKM) for Classifying Data with Non-conforming Features

    Language:HTML95506
  • BartJongejan/Bracmat

    Programming language for symbolic computation with unusual combination of pattern matching features: Tree patterns, associative patterns and expressions embedded in patterns.

    Language:C476125
  • IBM/pixiedust-facebook-analysis

    A Jupyter notebook that uses the Watson Visual Recognition and Natural Language Understanding services to enrich Facebook Analytics and uses Cognos Dashboard Embedded to explore and visualize the results in Watson Studio

    Language:Jupyter Notebook43172264
  • adansons/base

    Adansons Base is a data programming tool for error-analysis of training results. It organizes metadata of unstructured data and creates and organizes datasets. It makes dataset creation more effective and helps to find low-quality data by using the training results and improves AI performance.

    Language:Jupyter Notebook282523
  • instill-ai/console

    📺 Instill Console for 🔮 Instill Core: https://github.com/instill-ai/instill-core

    Language:TypeScript281109
  • chaitjo/knowledge-graphs

    Building Knowledge Graphs from Unstructured Text

    Language:Jupyter Notebook21217
  • cli

    instill-ai/cli

    ⌨️ Instill CLI for 🔮 Instill Core: https://github.com/instill-ai/instill-core

    Language:Go211203
  • amphi-ai/amphi-etl

    Low-code ETL for structured and unstructured data. Generates Python code you can deploy anywhere.

    Language:TypeScript20
  • instill-ai/deprecated-model

    ⚗️ Instill Model contains components for AI model orchestration

    Language:Makefile20704
  • instill-ai/pipeline-backend

    ⇋ A REST/gRPC server for Instill VDP API service

    Language:Go151208
  • TuanaCelik/unstructuredio-haystack

    💙 Unstructured Data Connectors for Haystack 2.0

    Language:Python15101
  • instill-ai/model-backend

    ⇋ A REST/gRPC server for Instill Model API service

    Language:Go141308
  • jokruger/rl3examples

    RL3 examples repository (information extraction, NER, NLP, web & text mining, etc).

    Language:Python14401
  • IBM/generate-insights-from-data-formats-with-watson

    How do we process data in different formats like docx, pdf etc and generate insights to be linked with structured data in database?This pattern helps in establishing relations between structured & unstructured data to generate recommendations using Watson NLU & Watson Studio.

    Language:Jupyter Notebook1312016
  • instill-ai/deprecated-core

    🔮 Instill Core contains components for supporting Instill VDP and Instill Model

    Language:Makefile13504
  • nicbet/infozilla

    The infoZilla unstructured software engineering data mining tool. It can find and extract source code regions, patches, stack traces, enumerations and itemizations from discussion threads.

    Language:Java13202
  • floriancochard/extract-data-from-paper

    Extract tabular information from scanned documents (PDF to CSV)

  • mkearney/wibble

    Web Data Frames

    Language:R1260
  • SachinKalsi/html_tag_annotator

    A Machine Learning tool to create the training dataset very quickly & easily by using a smart chrome extension

    Language:JavaScript12302
  • aclai-lab/SoleData.jl

    Manage unstructured and multimodal datasets!

    Language:Julia11450
  • Zipstack/unstract-adapters

    Unstract's interface to LLMs, Embeddings and VectorDBs.

    Language:Python9101