ai4science

There are 150 repositories under ai4science topic.

  • PaddlePaddle/PaddleOCR

    Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

    Language:Python63.1k49510.2k9.3k
  • opendatalab/MinerU

    Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.

    Language:Python48.3k1991.9k4k
  • microsoft/Graphormer

    Graphormer is a general-purpose deep learning backbone for molecular modeling.

    Language:Python2.4k28159366
  • bytedance/Protenix

    A trainable PyTorch reproduction of AlphaFold 3.

    Language:Python1.4k25167177
  • coderonion/awesome-llm-and-aigc

    🚀🚀🚀A collection of some awesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

  • terrastackai/terratorch

    A Python toolkit for fine-tuning Geospatial Foundation Models (GFMs).

    Language:Python65624271109
  • yuzhimanhua/Awesome-Scientific-Language-Models

    A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery (EMNLP'24)

  • JuDFTteam/best-of-atomistic-machine-learning

    🏆 A ranked list of awesome atomistic machine learning projects ⚛️🧬💎.

  • open-sciencelab/GraphGen

    GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation

    Language:Python48672338
  • microsoft/mattersim

    MatterSim: A deep learning atomistic model across elements, temperatures and pressures.

    Language:Jupyter Notebook476123366
  • PaddlePaddle/PaddleScience

    PaddleScience is SDK and library for developing AI-driven scientific computing applications based on PaddlePaddle.

    Language:Python4171994234
  • LucaOne/LucaOne

    The resources of LucaOne, including: the model code, training scripts, embedding inference code, and trained checkpoints.

    Language:Python3084834
  • ltjed/freephdlabor

    freephdlabor: customizing personalized multiagent systems that researchs 24/7 on your own scientific problem

    Language:Python286
  • shengchaochen82/Awesome-Foundation-Models-for-Weather-and-Climate

    A comprehesive survey about foundation models for weather and cliamte data understanding.

  • ai-boost/awesome-ai-for-science

    A curated list of awesome AI tools, libraries, papers, datasets, and frameworks that accelerate scientific discovery — from physics and chemistry to biology, materials, and beyond.

    23019
  • Future-House/aviary

    A language agent gym with challenging scientific tasks

    Language:Python21081625
  • davendw49/k2

    Code and datasets for paper "K2: A Foundation Language Model for Geoscience Knowledge Understanding and Utilization" in WSDM-2024

    Language:Python20661418
  • deep-symbolic-mathematics/LLM-SR

    [ICLR 2025 Oral] This is the official repo for the paper "LLM-SR" on Scientific Equation Discovery and Symbolic Regression with Large Language Models

    Language:Python1836339
  • ChemFoundationModels/ChemLLMBench

    What can Large Language Models do in chemistry? A comprehensive benchmark on eight tasks

    Language:Jupyter Notebook161487
  • IntelliGen-AI/IntelliFold

    IntelliFold: A Controllable Foundation Model for General and Specialized Biomolecular Structure Prediction.

    Language:Python1532314
  • chao1224/Geom3D

    Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023

    Language:Python1282314
  • patrick-tssn/Awesome-Colorful-LLM

    Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics, Fundamental Sciences such as Mathematics, and Ominous.

  • Liu-Hy/GenoMAS

    A minimalist multi-agent framework for rubost automation of scientific analysis workflows, such as gene expression analysis.

    Language:Python1212216
  • AlexDuvalinho/geometric-gnns

    List of Geometric GNNs for 3D atomic systems

  • OSU-NLP-Group/ScienceAgentBench

    [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

    Language:Python1064615
  • chatsci/Aeiva

    A general AI agent framework that can be adapted to various tasks and environments.

    Language:Python102338
  • bytedance/PXDesignBench

    A Unified Evaluation Suite for Protein Design

    Language:Python998
  • OSU-NLP-Group/LLM4Chem

    Official code repo for the paper "LlaSMol: Advancing Large Language Models for Chemistry with a Large-Scale, Comprehensive, High-Quality Instruction Tuning Dataset"

    Language:Python997917
  • chao1224/ProteinDT

    A Text-guided Protein Design Framework, Nat Mach Intell 2025 (https://www.nature.com/articles/s42256-025-01011-z)

    Language:Python95458
  • chiang-yuan/llamp

    A web app and Python API for multi-modal RAG framework to ground LLMs on high-fidelity materials informatics. An agentic materials scientist powered by @materialsproject, @langchain-ai, and @openai

    Language:Jupyter Notebook8712413
  • deep-symbolic-mathematics/llm-srbench

    [ICML2025 Oral] LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models

    Language:Python816
  • ai4ce/GARF

    [ICCV2025] GARF: Learning Generalizable 3D Reassembly for Real-World Fractures

    Language:Python8062110
  • AngxiaoYue/ReQFlow

    [ICML 2025] 🧬 ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation

    Language:Python79235
  • chenggroup/ai2-kit

    A toolkit featured artificial intelligence × ab initio for computational chemistry research.

    Language:Python793316
  • deep-symbolic-mathematics/TPSR

    [NeurIPS 2023] This is the official code for the paper "TPSR: Transformer-based Planning for Symbolic Regression"

    Language:Python785515
  • OSU-NLP-Group/awesome-agents4science

    A curated list of papers on LLMs and agents for scientific research and development