information-retrieval

There are 2841 repositories under information-retrieval topic.

  • scilla

    Information Gathering tool - DNS / Subdomains / Ports / Directories enumeration

    Language:Go1.1k
  • anserini

    anserini

    Anserini is a Lucene toolkit for reproducible information retrieval research

    Language:Java1.1k
  • GNN-Recommender-Systems

    An index of recommendation algorithms that are based on Graph Neural Networks. (TORS)

  • awesome-ai-web-search

    List of software that allows searching the web with the assistance of AI: https://hf.co/spaces/felladrin/awesome-ai-web-search

    Language:HTML1k
  • pisa

    pisa

    PISA: Performant Indexes and Search for Academia

    Language:C++1k
  • allRank

    allRank is a framework for training learning-to-rank neural models based on PyTorch.

    Language:Python959
  • raft

    RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing high performance applications.

    Language:Cuda933
  • notes

    Learn about Machine Learning and Artificial Intelligence

  • splade

    SPLADE: sparse neural search (SIGIR21, SIGIR22)

    Language:Python904
  • sgpt

    SGPT: GPT Sentence Embeddings for Semantic Search

    Language:Jupyter Notebook873
  • RocketQA

    🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

    Language:Python781
  • awesome-neural-models-for-semantic-match

    A curated list of papers dedicated to neural text (semantic) matching.

    Language:HTML781
  • awesome-persian-nlp-ir

    Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

  • RAG-FiT

    Framework for enhancing LLMs for RAG tasks using fine-tuning.

    Language:Python748
  • toolfront

    Data retrieval for AI agents

    Language:Python720
  • talisman

    Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.

    Language:JavaScript717
  • EmbedAnything

    Highly Performant, Modular and Production-ready Inference, Ingestion and Indexing built in Rust 🦀

    Language:Rust715
  • tevatron

    Tevatron - Unified Document Retrieval Toolkit across Scale, Language, and Modality. Demo in SIGIR 2023, SIGIR 2025.

    Language:Python694
  • teaching

    Open-Source Information Retrieval Courses @ TU Wien

    Language:Python682
  • gritlm

    Generative Representational Instruction Tuning

    Language:Jupyter Notebook672
  • s3

    [EMNLP'25] s3 - ⚡ Efficient & Effective Search Agent Training via RL for RAG (Verifier-Powered RLVR for Search)

    Language:Python671
  • awesome-pretrained-models-for-information-retrieval

    A curated list of awesome papers related to pre-trained models for information retrieval (a.k.a., pretraining for IR).

  • RankGPT

    Is ChatGPT Good at Search? LLMs as Re-Ranking Agent [EMNLP 2023 Outstanding Paper Award]

    Language:Python636
  • DeepRetrieval

    [COLM'25] DeepRetrieval - 🔥 Training Search Agent with Retrieval Outcomes via Reinforcement Learning

    Language:Python635
  • cdQA

    cdQA

    ⛔ [NOT MAINTAINED] An End-To-End Closed Domain Question Answering System.

    Language:Python616
  • DensePhrases

    [ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624

    Language:Python606
  • ranx

    ranx

    ⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍

    Language:Python591
  • xyne

    xyne

    AI-first Search & Answer Engine for work. Open-source alternative to Glean.

    Language:TypeScript587
  • pylate

    Late Interaction Models Training & Retrieval

    Language:Python583
  • resin

    Vector space index based search engine that's available as a HTTP service or as an embedded library.

    Language:C#570
  • sycamore

    🍁 Sycamore is an LLM-powered search and analytics platform for unstructured data.

    Language:Python561
  • NLP-Projects

    word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, information extraction (i.e., entity, relation and event extraction), knowledge graph, text generation, network embedding

    Language:OpenEdge ABL558
  • AnglE

    Train and Infer Powerful Sentence Embeddings with AnglE | 🔥 SOTA on STS and MTEB Leaderboard

    Language:Python555
  • Automated-Fact-Checking-Resources

    Links to conference/journal publications in automated fact-checking (resources for the TACL22/EMNLP23 paper).

  • Deep-Semantic-Similarity-Model

    My Keras implementation of the Deep Semantic Similarity Model (DSSM)/Convolutional Latent Semantic Model (CLSM) described here: http://research.microsoft.com/pubs/226585/cikm2014_cdssm_final.pdf.

    Language:Python522