information-retrieval

There are 2840 repositories under information-retrieval topic.

  • JaidedAI/EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Language:Python27.9k3181k3.5k
  • haystack

    deepset-ai/haystack

    AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

    Language:Python22.5k1584.1k2.4k
  • arc53/DocsGPT

    Private AI platform for agents, assistants and enterprise search. Built-in Agent Builder, Deep research, Document analysis, Multi-model support, and API connectivity for agents.

    Language:TypeScript17.1k1005271.8k
  • gensim

    piskvorky/gensim

    Topic Modelling for Humans

    Language:Python16.2k4261.9k4.4k
  • weaviate

    weaviate/weaviate

    Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database​.

    Language:Go14.6k1342.6k1.1k
  • onyx

    onyx-dot-app/onyx

    Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.

    Language:Python13.5k1068631.9k
  • Unstructured-IO/unstructured

    Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.

    Language:HTML12.7k681.2k1k
  • txtai

    neuml/txtai

    💡 All-in-one open-source AI framework for semantic search, LLM orchestration and language model workflows

    Language:Python11.6k111916740
  • FlagOpen/FlagEmbedding

    Retrieval and Retrieval-augmented LLMs

    Language:Python10.5k531.2k782
  • marqo

    marqo-ai/marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

    Language:Python4.9k37246214
  • apache/lucene-solr

    Apache Lucene and Solr open-source search software

  • KittyKatt/screenFetch

    Fetches system/theme information in terminal for Linux desktop screenshots.

    Language:Shell4k94383454
  • SylphAI-Inc/AdalFlow

    AdalFlow: The library to build & auto-optimize LLM applications.

    Language:Python3.7k2677341
  • langroid

    langroid/langroid

    Harness LLMs with Multi-Agent Programming

    Language:Python3.7k28247349
  • catalyst-team/catalyst

    Accelerated deep learning R&D

    Language:Python3.4k44355393
  • apache/lucene

    Apache Lucene open-source search software

    Language:Java3.1k8911.6k1.3k
  • embeddings-benchmark/mteb

    MTEB: Massive Text Embedding Benchmark

    Language:Python2.8k211.2k466
  • tensorflow/ranking

    Learning to Rank in TensorFlow

    Language:Python2.8k93319480
  • StringZilla

    ashvardanian/StringZilla

    Up to 100x faster strings for C, C++, CUDA, Python, Rust, Swift, JS, & Go, leveraging NEON, AVX2, AVX-512, SVE, GPGPU, & SWAR to accelerate search, hashing, sorting, edit distances, sketches, and memory ops 🦖

    Language:C2.7k259489
  • InvoiceNet

    naiveHobo/InvoiceNet

    Deep neural network to extract intelligent information from invoice documents.

    Language:Python2.6k76105411
  • rajkumardusad/IP-Tracer

    Track any ip address with IP-Tracer. IP-Tracer is developed for Linux and Termux. you can retrieve any ip address information using IP-Tracer.

    Language:PHP2.6k16087472
  • illuin-tech/colpali

    The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.

    Language:Python2.2k18104198
  • xlang-ai/instructor-embedding

    [ACL 2023] One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    Language:Python2k18113155
  • beir

    beir-cellar/beir

    A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

    Language:Python2k20144218
  • pyserini

    castorini/pyserini

    Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations.

    Language:Python1.9k18590439
  • Awesome-FL

    youngfish42/Awesome-FL

    Comprehensive and timely academic information on federated learning (papers, frameworks, datasets, tutorials, workshops)

    Language:Python1.8k446203
  • shaoxiongji/knowledge-graphs

    A collection of research on knowledge graphs

    Language:JavaScript1.8k616298
  • fastRAG

    IntelLabs/fastRAG

    Efficient Retrieval Augmentation and Generation Framework

    Language:Python1.7k1535155
  • th3unkn0n/TeleGram-Scraper

    telegram group scraper tool. fetch all information about group members

    Language:Python1.6k1080732
  • boudinfl/pke

    Python Keyphrase Extraction module

    Language:Python1.6k30147294
  • SimSIMD

    ashvardanian/SimSIMD

    Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, & SVE2 📐

    Language:C1.5k2010587
  • apache/solr

    Apache Solr open-source search software

    Language:Java1.5k540706
  • th3unkn0n/osi.ig

    Information Gathering Instagram.

    Language:Python1.4k5983230
  • superlinked/superlinked

    Superlinked is a Python framework for AI Engineers building high-performance search & recommendation applications that combine structured and unstructured data.

    Language:Jupyter Notebook1.3k2628101
  • xhluca/bm25s

    Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy

    Language:Python1.3k54777
  • dorianbrown/rank_bm25

    A Collection of BM25 Algorithms in Python

    Language:Python1.2k83398