superdma's Stars
Ckend/scihub-cn
国内环境下可用的scihub论文下载器
ConardLi/easy-dataset
A powerful tool for creating fine-tuning datasets for LLM
LearningCircuit/local-deep-research
Local Deep Research is an AI-powered assistant that transforms complex questions into comprehensive, cited reports by conducting iterative analysis using any LLM across diverse knowledge sources including academic databases, scientific repositories, web content, and private document collections.
dromara/RuoYi-Vue-Plus
基于RuoYi-Vue集成 Lombok+Mybatis-Plus+Undertow+knife4j+Hutool+Feign 重写所有原生业务 定期与RuoYi-Vue同步
iriscxy/chemmatch
datawhalechina/self-llm
《开源大模型食用指南》针对**宝宝量身打造的基于Linux环境快速微调(全参数/Lora)、部署国内外开源大模型(LLM)/多模态大模型(MLLM)教程
sailfish009/OpenSourceMolecularModeling.github.io
Catalog of Open Source Molecular Modeling Projects
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
deepdoctection/deepdoctection
A Repo For Document AI
modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
CambridgeMolecularEngineering/chemdataextractor2
ChemDataExtractor Version 2.0
dmw51/reactiondataextractor2
This repo contains ReactionDataExtractor v.2 - software toolkit for extraction of information from chemical reaction schemes
rxn4chemistry/paragraph2actions
Extraction of action sequences from experimental procedures
jiangfeng1124/ChemRxnExtractor
Toolkit for Chemical Reaction Extraction from Scientific Literature (JCIM 2021)
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
run-llama/llama_cloud_services
Knowledge Agents and Management in the Cloud
CambioML/uniflow-llm-based-pdf-extraction-text-cleaning-data-clustering
LLM-based text extraction from unstructured data like PDFs, Words and HTMLs. Transform and cluster the text into your desired format. Less information loss, more interpretation, and faster R&D!
UpstageAI/dataverse
The Universe of Data. All about data, data science, and data engineering
FlagOpen/FlagData
CrazyBoyM/llama3-Chinese-chat
Llama3、Llama3.1 中文后训练版仓库(微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)
kjappelbaum/awesome-chemistry-datasets
overview of datasets for ML in chemistry
odysie/thermoelectricsdb
allenai/s2orc-doc2json
Parsers for scientific papers (PDF2JSON, TEX2JSON, JATS2JSON)
allenai/s2orc
S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/
titipata/scipdf_parser
Python PDF parser for scientific publications: content and figures
pdfminer/pdfminer.six
Community maintained fork of pdfminer - we fathom PDF
THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
jzhang38/TinyLlama
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。