Title: GitHub收集项目维护

Title: GitHub收集项目维护
- 编程语言
  - Python
    - 基础
    - 开源项目
      - 爬虫
      - 异步
      - 字符串搜索
      - 工具库
      - Log
      - GUI
      - 数据库
      - 其他
  - C++
    - 文档资料
    - 开发工具
    - Debug
    - 日志
    - 服务器
      - C++
      - C
      - media server
    - 并发
    - 编码
    - 测试
    - 数据库
    - 小游戏
    - 序列化
    - 算法
    - 语言
    - 引擎
    - 实用工具
  - Julia
  - Java
  - GO
  - RUST
  - 数据结构与算法
  - 计算机科学基础
  - 自己动手写项目
  - IoT
  - 其他语言
- 深度学习
- 机器学习
- 比赛方案
  - 比赛信息
- 开源工具
- 数据集
  - 工具包
  - NLP
  - 标注工具
  - 图书
- Blogs + 面经

1 编程语言

Python

基础

one-python-craftsman：python
PySnooper：Never use print for debugging again，debug tools
The Flask Mega-Tutorial：Flask教程
微软教程：在 Windows 上使用 Python 进行开发
realpython -- python-guide
pipreqs：依赖包建立工具
starlette: The little ASGI framework that shines which is ideal for building async web services in Python.
fastapi
python-small-examples
pytudes：Python programs to practice or demonstrate skills.
python-patterns：A collection of design patterns/idioms in Python
python并行编程
[C++联合编程微软Doc]
scalene：Scalene: a high-performance, high-precision CPU and memory profiler for Python
pyinstrument: 🚴 Call stack profiler for Python. Shows you why your code is slow!
devguide: The Python developer's guide
python-cheatsheet
python-systemd-tutorial: A tutorial for writing a systemd service in Python

开源项目

爬虫

收集各种爬虫（默认爬虫语言为 python）
lazynlp：Library to scrape and clean web pages to create massive datasets.
WeiboSpider: This is a sina weibo spider built by scrapy
weibospider：A distributed crawler for weibo, building with celery and requests.
webspider：数据库用的是MySQL, 主要用到的库是celery和requests，并实现了定时任务，出错重试，日志记录，自动更改Cookies等的功能
scrapyscript：Run a Scrapy spider programmatically from a script or a Celery task
pspider：一个简单的分布式爬虫框架
taoyoulue_spider：基于mongodb存储，redis缓存，celery 实现的分布式爬虫。
DeadPool：使用celery作为主体框架的爬虫应用，能够灵活的添加爬虫任务，并且同时运行多站点的爬虫工作
python-bloomfilter：Scalable Bloom Filter implemented in Python
learn_python3_spider：python爬虫教程系列、从0到1学习python爬虫，包括浏览器抓包，手机APP抓包
pyspider：A Powerful Spider(Web Crawler) System in Python.
examples-of-web-crawlers：一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站
PythonCrawler：用python编写的爬虫项目集合
MechanicalSoup: A Python library for automating interaction with websites

异步

celery：Distributed Task Queue (development branch)【master-worker模式】【轻量级】

字符串搜索

pyahocorasick：pyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find multiple key strings occurrences at once in some input text.
- AC自动机：link link
ahocorasick-python：AC自动机python的实现，并进行了优化
ahocorapy: Pure python Aho-Corasick library.
acora: Fast multi-keyword search engine for text strings.
datrie：Fast, efficiently stored Trie for Python. Uses libdatrie. pypi

工具库

rich：Rich is a Python library for rich text and beautiful formatting in the terminal.
apscheduler：Task scheduling library for Python
jupytext：Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
nbdev：Create delightful python projects using Jupyter Notebooks
python-wechaty-getting-started：Python Wechaty Starter Project Template that Works Out-of-the-Box。Wechaty is a RPA SDK for Wechat Individual Account that can help you create a chatbot in 9 lines of Python.
python-wechaty
sqlfluff: A SQL linter and auto-formatter
pyenv：Python version management
py-spy: Python 程序实时性能分析工具
rocketry：Modern scheduling library for Python
reloadium：Advanced Hot Reloading & Profiling for Python
schedule：Python job scheduling for humans.
pdf2docx: converting pdf to docx.
Supervisor：是实际企业常用的一款 Linux/Unix 系统下的一个进程管理工具
bandit：Bandit is a tool designed to find common security issues in Python code.
Pillow: Python Imaging Library.
memray：memory profiler for Python

Log

loguru: 更简洁的python log工具

GUI

DearPyGui: A fast and powerful Graphical User Interface Toolkit
Tkinter-Designer： Create Beautiful Tkinter GUIs by Drag and Drop

数据库

learndb-py：Learn database internals by implementing it from scratch.
tinydb：a lightweight document oriented database.
dataset：Easy-to-use data handling for SQL data.
KeyDB：A Multithreaded Fork of Redis
miniob：MiniOB is a compact database that assists developers in understanding the fundamental workings of a database.
dragonfly：A modern replacement for Redis and Memcached

其他

python-goose：Goose 用于文章提取器
python-gems：有趣的 Pyhton 代码片段集合
listen1：Listen 1 让你用一个网页就能听到多个网站的在线音乐，支持各种平台
beijing_bus：北京实时公交，可以显示查询的公交到达某站还需多久
tushare：TuShare 是一个免费、开源的 Python 财经数据接口包，TuShare 文档
pyecharts：pyecharts是一个由 Echarts+Python 实现的一个用于生成 Echarts 图表的类库
FeelUOwn：FeelUOwn 是一个用Python写的，面向Linux/macOS平台的开源音乐播放器
superset：superset是一个实际企业级开发项目，由 airbnb用 Python开发的数据探索
Spug：Spug 是一款使用 Python+Flask+Vue+Element 组件开发的开源运维管理系统
textract：文字提取工具
manim：Animation engine for explanatory math videos
gopup：数据接口：百度、谷歌、头条、微博指数,宏观数据，利率数据，货币汇率，千里马、独角兽公司，新闻联播文字稿，影视票房数据，高校名单，疫情数据...
bigdata_analyse：简短的数据分析demo
playwright-python: python操作浏览器
PyWebIO：Write interactive web app in script way. 用python生成前端代码
moviepy：使用python简易编辑视频
playwright-python：Python version of the Playwright testing and automation library.
Handright：模拟手写汉字
WantWords：根据输入表达，输出同义的词
docker-py：使用python创建docker
pikepdf：编辑PDF
kopf: 用 Python 轻松完成，需要条件判断、事件触发等复杂的 k8s 操作。
matrix-webcam：Take your video conference from within the matrix.
latexify_py：A library to generate LaTeX expression from Python code.
Games：Create interesting games by pure python.
FileCodeBox: 匿名口令分享文本，文件，像拿快递一样取文件
Django-Styleguide: Django styleguide used in HackSoft projects
bar_chart_race: Create animated bar chart races in Python with matplotlib

C++

文档资料

CPlusPlusThings：C++那些事
CppCoreGuidelines：The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
EffectiveModernCppChinese：《Effective Modern C++》翻译
Cpp_Primer_Practice：C++ Primer 笔记和课后练习答案
TCP-IP-NetworkNote：《TCP/IP网络编程》(韩-尹圣雨)学习笔记
rCore-Tutorial-Book-v3：清华操作系统课rCore
tensorflow-internals：TensorFlow kernel and implementation mechanism.
cpp-game-engine-book: 从零编写游戏引擎教程
patterns-of-distributed-systems：《Patterns of Distributed Systems》中文版
PPHC：《高并发的哲学原理》开源图书（CC BY-NC-ND）
modern-cpp-features：A cheatsheet of modern C++ language and library features.
awesome-c-cn：C 资源大全中文版，包括构建系统、编译器、数据库、加密、初中高的教程/指南、书籍、库

开发工具

openFrameworks：openFrameworks is a community-developed cross platform toolkit for creative coding in C++.
plywood : A multimedia development kit for C++
folly: An open-source C++ library developed and used at Facebook.
mlibc: 可移植的C标准库
util-linux
MyTinySTL： stl
WindowsAppSDK ：Windows App SDK empowers all Windows Desktop apps with modern Windows UI, APIs, and platform features, including back-compat support, shipped via NuGet.
range-v3：Range library for C++14/17/20, basis for C++20's std::ranges
vcpkg：微软开源的 C/C++ 包管理工具
libqalculate：使用 C++ 编写的多功能计算器桌面应用、库和 CLI 程序
indicators： C++ 编写的进度条库
Crow：A Fast and Easy to use microframework for the web.、
poco：cross-platform C++ libraries for building network- and internet-based applications that run on desktop, server, mobile, IoT, and embedded systems. (Websocket)

Debug

rr: Record and Replay Debug Framework
backward-cpp：A beautiful stack trace pretty printer for C++. 优化编译器报错信息
Sourcetrail：Sourcetrail - free and open-source interactive source explorer，代码结构可视化
rr：Record and Replay Framework

日志

spdlog： C++ 日志库

服务器

oatpp： Light and powerful C++ web framework for highly scalable and resource-efficient web application.
nginx：静态服务器
fasthttp (Go)：Fast HTTP package for Go.
Redis
f-stack: F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
wrk: Modern HTTP benchmarking tool
sylar-from-scratch: 从零开始重写sylar C++高性能分布式服务器框架
sylar: C++高性能分布式服务器框架,webserver,websocket server

C++

TinyWebServer：Linux下C++轻量级Web服务器
TinyWeb: web server in C++11.
30dayMakeCppServer：自制C++服务器
drogon：Drogon: A C++14/17 based HTTP web application framework running on Linux/macOS/Unix/Windows
cpp-httplib: A C++ header-only HTTP/HTTPS server and client library
yalantinglibs: A collection of C++20 libraries, include async_simple, coro_rpc and struct_pack
dragonfly: A modern replacement for Redis and Memcached
brpc：百度开源的 RPC 框架
srpc：基于 C++ Workflow 的高性能 RPC 框架
Server: build a Async Server.
WebServer：A C++ High Performance Web Server
WebServer：Epoll与线程池实现多线程的Reactor Server
TKeed：High Performance HTTP WebServer
cpp-httplib：A C++ header-only HTTP/HTTPS server and client library
ananas：A C++11 RPC framework based on future and protobuf, with utility: timer,ssl,future/promise,log,coroutine,etc
da4qi4：web server
tinyrpc: c++ async rpc framework. 14w+qps.

C

mongoose: Embedded Web Server
Tinyhttpd： http server
C-Web-Server: A simple webserver written in C
socket-server: socket server examples.
freeradius-server: A multi-protocol policy server.
civetweb: Embedded C/C++ web server
pure-ftpd: Pure FTP server
tiny-web-server: tiny web server in C
wsServer: a tiny WebSocket server library written in C
chat_room：多人聊天工具

media server

nginx-rtmp-module: NGINX-based Media Streaming Server
media-server: RTSP/RTP/RTMP/FLV/HLS/MPEG-TS/MPEG-PS/MPEG-DASH/MP4/fMP4/MKV/WebM
ZLMediaKit：一个基于C++11的高性能运营级流媒体服务框架

websocket

libwebsockets: canonical libwebsockets.org networking library
uWebSockets: Simple, secure & standards compliant web server for the most demanding of applications
websocketpp: C++ websocket client/server library

Game Server Framework

atsf4g-co: service framework for game server
YTSvrLib：一个简单但功能强大的跨平台(Win/Linux)游戏服务器框架
server：game server
skynet：A lightweight online game framework

CGI

fastcgi-async-or-coroutine：mcgi is a asynchronous fastcgi using Muduo Network Library. cocgi is a coroutine fastcgi using Tencent Libco Library.

并发

async_simple: 轻量级C++异步框架
workflow: C++ Parallel Computing and Asynchronous Networking Engine
libgo：Go-style concurrency in C++11
coost：A tiny boost library in C++11.
libaco：A blazing fast and lightweight C asymmetric coroutine library
quantum: Powerful multi-threaded coroutine dispatcher and parallel execution engine

for basic learn:

luce：C++20协程net，基于epoll，可以方便地使用await语法
co_context：C++协程框架
workspace：基于C++11的轻量级异步执行框架，支持：通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。
handy: 简洁易用的C++11网络库 / 支持单机千万并发连接 / a simple C++11 network server framework
coroutine: A asymmetric coroutine library for C.
TSF4G ：服务器开发工具，blog doc
coroutine：C++ 20 Coroutines in Action.

QT

FileCentipede：Cross-platform internet upload/download manager for HTTP(S), FTP(S), SSH, magnet-link, BitTorrent, m3u8, ed2k, and online videos.

量化

wondertrader：量化研发交易一站式框架，文档

编码

utf8.h：single header utf8 string functions for C and C++
utf8proc：a clean C library for processing UTF-8 Unicode data
utf8：A simple utf8 decode/encode lib in c
gbk-utf8：GBK to UTF-8,vice versa.
conversion_gbk_utf8: C++11 gbk和utf8编码的相互转换收集
qqee_clib：跨平台C语言基础库，适配任意ANSI_C编译器

测试

Catch2：A modern, C++-native, header-only, test framework for unit-tests
doctest：The fastest feature-rich C++11/14/17/20/23 single-header testing framework

数据库

rocksdb：基于 levelDB 开发，使用 C++ 编写的高性能键值存储引擎
KeyDB：A Multithreaded Fork of Redis
redis-plus-plus: Redis client written in C++
ormpp: modern C++ ORM, C++17, support mysql, postgresql,sqlite
bustub: 数据库管理系统，15445.courses.cs.cmu.edu/

小游戏

sudoku：C++实现的跨平台数独游戏
Snake：贪吃蛇游戏 AI 版
FSHistory： Play and Enjoy the History of Microsoft Flight Simulator

序列化

cjson
json：C++ 的 JSON 库
fast-cpp-csv-parser

算法

simhash：中文文档计算出对应的 simhash 值。simhash 是谷歌用来进行文本去重的算法（详见 simhash 算法原理及实现）
ltp：语言技术平台（Language Technology Platform，LTP）是哈工大社会计算与信息检索研究中心历时十年开发的一整套中文语言处理系统
TinyML：精简的C++ 机器学习库
flashlight: A C++ standalone library for machine learning
uthash：C macros for hash tables and more
wwsearch：腾讯全文搜索引擎
KuiperInfer：一个高性能的深度学习推理库

语言

taichi：太极是一种用于计算机图形应用的高性能编程语言
craftinginterpreters：Crafting Interpreters

引擎

godot：Godot Engine – Multi-platform 2D and 3D game engine

实用工具

aseprite：Animated sprite editor & pixel art tool
krita: 开源绘画软件
vnote：Markdown 编辑软件
keepassx： a cross platform port of the windows application “Keepass Password Safe”.

Julia

julia_notebooks：Julia Jupyter/Colab Notebooks
julia：The Julia Language: A fresh approach to technical computing.

Java

SpringAll：循序渐进，学习Spring Boot、Spring Boot & Shiro、Spring Batch、Spring Cloud、Spring Cloud Alibaba、Spring Security & Spring Security OAuth2

GO

7days-golang：7 days golang apps from scratch、
GoSpark
golearn：Machine Learning for Go

RUST

This Week in Rust (this-week-in-rust.org)

操作系统

tock：A secure embedded operating system for microcontrollers
blog_os：Writing an OS in Rust

系统工具开发

ripgrep：recursively searches directories for a regex pattern while respecting your gitignore

异步编程

mio：Metal I/O library for Rust.

数学计算

nalgebra：Linear algebra library for Rust.

应用

ncspot：Cross-platform ncurses Spotify client written in Rust, inspired by ncmpc and the likes.
mdBook：Create book from markdown files.
zino：Next-generation framework for composable applications in Rust.

游戏引擎

bevy：A refreshingly simple data-driven game engine built in Rust.
hematite：A simple Minecraft written in Rust with the Piston game engine
Fyrox：3D and 2D game engine written in Rust

书籍/博客

command-line-rust: A Project-Based Primer for Writing Rust CLIs
rust-blog：Educational blog posts for Rust beginners

数据结构与算法

数据结构和算法必知必会的50个代码实现
Python算法实现-轻量级用动画的形式呈现解LeetCode题目的思路-轻量级
All Algorithms implemented in Python
algorithms：Minimal examples of data structures and algorithms in Python
Algorithms implemented in C++：For education
awesome-algorithms：精选的学习和/或练习算法的资源列表
fucking-algorithm：刷算法全靠套路，labuladong
leetcode_company_wise_questions ：按公司分类的题目列表
ands：Algorithms and data structures for educational, demonstrational and experimental purposes.
Algorithms_in_C
hello-algo：动画图解、能运行、可提问的数据结构与算法入门书。提供 Java, C++, Python, Go, JS, TS, C# 多语言代码实现。

计算机科学基础

计算机速成课
TeachYourselfCS-CN
computer-science：🎓 Path to a free self-taught education in Computer Science!
University-Courses-China: **几所大学课程资料整理，里面有的课程资料带有习题答案，这点很nice
REKCARC-TSC-UHT：清华大学计算机系课程攻略
zju-icicles：浙江大学课程攻略共享计划 web page
CS-Xmind-Note：计算机专业课（408）思维导图和笔记：计算机组成原理（第五版王爱英），数据结构（王道），计算机网络（第七版谢希仁），操作系统（第四版汤小丹）
awesome-courses：List of awesome university courses for learning Computer Science!
Computer-Networking-A-Top-Down-Approach-NOTES：《计算机网络－自顶向下方法(原书第6版)》编程作业，Wireshark实验文档的翻译和解答
dragon-book-exercise-answers：编译原理（紫龙书）第2版习题答案
90DaysOfDevOps： documenting repository for learning the world of DevOps.
Developer-Books：编程开发相关书籍整理分享

CSAPP

深入理解计算机系统：videos&lectures

labs： https://github.com/Exely/CSAPP-Labs

系统设计

system-design-primer： Learn how to design large-scale systems.
Distributed system resources：分布式资料列表
Database system resources：数据库系统资料列表
system-design: Learn how to design systems at scale and prepare for system design interviews

架构技能树

architect-awesome

GameDevMind

自己动手写项目

danistefanovic / build-your-own-x：Build your own (insert technology here)
bregman-arie / devops-exercises：Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization
PlayWithCompiler：A GeekTime course about constructing a compiler
15-minute-apps：15 minute (small) desktop apps built with PyQt
wyoos：Source codes for the "Write your own Operating System" video-series on YouTube
SerenityOS：复古OS，较大的项目

IoT

IoT-For-Beginners: 24 Lessons
TencentOS-tiny: 物联网终端操作系统
micropython: 可运行在单片机上的 Python
rpi4-osdev：Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4
Awesome IoT
Awesome-Embedded

其他语言

purr-data：Pure Data (aka Pd) is a visual programming language.
astro: Build fast websites, faster. 🚀🧑‍🚀✨. 快速编写简单web页面
开源软件指南 | Open Source Guides：Github开源贡献指南

2 深度学习

Colab神器

colabcode: Run VSCode (codeserver) on Google Colab or Kaggle Notebooks. 访问时，使用 https 协议！ python extention 使用 2020.5.86806 版本，最新版本不能debug 和选择解释器。

损失函数

self-adj-dice：Implementation of Self-adjusting Dice Loss from "Dice Loss for Data-imbalanced NLP Tasks" paper
pytorch-loss：label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax ...

AI 系统

DeepLearningSystem：Deep Learning System core principles introduction.

资源

annotated_deep_learning_paper_implementations ：Implementations/tutorials of deep learning papers with side-by-side notes; including transformers (original, xl, switch, feedback), optimizers(adam, radam, adabelief), gans(dcgan, cyclegan, stylegan2), reinforcement learning (ppo, dqn), capsnet, sketch-rnn, etc.
awesome-deep-trading
ML-Papers-Explained：Explanation to key concepts in ML
Understanding Deep Learning

NLP

nlp-recipes：Natural Language Processing Best Practices & Examples
checklist：多种方法评测NLP/NLG任务，颠覆多种SOTA模型在传统评测指标下的结论。
NLP-pretrained-model：A collection of Natural language processing pre-trained models.
EasyTransfer：EasyTransfer is designed to make the development of transfer learning in NLP applications easier.
google-research / language：Shared repository for open-sourced projects from the Google AI Language team.
HuggingFace Bert Model Download Site
悟道中文语言模型资源

1 Transformer

delight：DeLighT: Very Deep and Light-Weight Transformers. DeFINE (ICLR'20) and DeLighT (preprint).
Transformer-Clinic：Understanding the Difficulty of Training Transformers
rezero：This repository contains the ReZero-Transformer implementation from the paper. It matches Pytorch's Transformer and can be easily used as a drop-in replacement. ReZero is All You Need: Fast Convergence at Large Depth; ArXiv, March 2020.
collaborative-attention：Code for Multi-Head Attention: Collaborate Instead of Concatenate
sentence_transformer_zh
Transformer-Transducer：A streamable speech recognition model。
Informer2020：the origin Pytorch implementation of Informer in the following paper: Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting.
linformer-pytorch: implementation of Linformer for Pytorch.
awesome-fast-attention: list of efficient attention modules
Synthesizer-Rethinking-Self-Attention-Transformer-Models + Synthesizer: Implementing SYNTHESIZER
aft-pytorch：Unofficial PyTorch implementation of the Attention Free Transformer's AFT-Full layer by Apple Inc
x-transformers：A simple but complete full-attention transformer with a set of promising experimental features from various papers
Transformers-Tutorials：demos made with the Transformers library by HuggingFace.

2 预训练模型

awesome-pretrained-chinese-nlp-models：高质量中文预训练模型下载链接集合
awesome-bert: BERT、XLNet 相关论文和 github 项目
BERT-Tickets：BERT的进一步探究
TensorFlow 官方 code and pre-trained models for BERT
UER-py：Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
huggingface transformers：Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
ERINE：An Implementation of ERNIE For Language Understanding (including Pre-training models and Fine-tuning tools)
ERNIE-Pytorch
五行代码玩转GPT-2Easy-to-use Wrapper for GPT-2
google-research / text-to-text-transfer-transformer
bert-as-service：Mapping a variable-length sentence to a fixed-length vector using BERT model
Chinese-BERT-wwm: Pre-Training with Whole Word Masking for Chinese BERT（中文BERT-wwm系列模型）
Chinese-ELECTRA：Pre-trained Chinese ELECTRA（中文ELECTRA预训练模型）
bert-for-tf2：A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.
bert4keras：keras implement of transformers for humans
ERNIE-Pytorch: This project is to convert ERNIE to huggingface's format.
unilm：UniLM - Unified Language Model Pre-training / Pre-trained models for natural language understanding (NLU) and generation (NLG) tasks
XLnet

marge-pytorch：Implementation of Marge, Pre-training via Paraphrasing, in Pytorch
BERT-CCPoem：BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry
microsoft/Unicoder : Unicoder model for understanding and generation. This repo provides the code for reproducing the experiments in XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation (leaderboard).
CPM-Generate：Chinese Pre-Trained Language Models (CPM-LM) Version-I
CPM-LM-TF2：TensorFlow 2.x CPM-Generate
BERT-whitening：简单的向量白化就可以媲美BERT flow
OptiPrompt：NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall
R-Drop：R-drop: Regularized Dropout for Neural Networks. 一种loss设计
悟道中文语言模型资源

3 其他语言模型

T5： https://github.com/google-research/text-to-text-transfer-transformer 中文博客
google-research https://github.com/google-research/google-research
ELECTRA: 超越BERT, 19年最佳NLP预训练模型
ZEN中文预训练语言模型： https://github.com/sinovation/ZEN
albert：ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
GPT2-Chinese
gpt2-ml：GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型
Decoders-Chinese-TF2.0：GPT2 training script for Chinese in Tensorflow 2.0
MetaAdapter：Specific Layers in Multilingual Language Models
BERT-flow：TensorFlow implementation of On the Sentence Embeddings from Pre-trained Language Models (EMNLP 2020)
gpt-2：Code for the paper "Language Models are Unsupervised Multitask Learners"
NeZha_Chinese_PyTorch：pytorch版NEZHA，适配transformers
luke：LUKE -- Language Understanding with Knowledge-based Embeddings

4 CRF、LAN

label-attention inference paper
pytorch-struct：A library of tested, GPU implementations of core structured prediction algorithms for deep learning applications. [概率图模型]

5 CLUE项目合辑（NLP）

CLUE：Organization of Language Understanding Evaluation benchmark for Chinese

6 BERT Applications

rasa_chatbot_cn:基于最新版本rasa搭建的对话系统
Bert-Chinese-Text-Classification-Pytorch:使用Bert，ERNIE，进行中文文本分类
BERT-train2deploy：BERT模型从训练到部署
rasa-tutorial：Rasa中文demo与指南
rasa-ui: Rasa UI is a frontend for the Rasa Framework
text_matching：常用文本匹配模型tf版本，数据集为QA_corpus
sentence-transformers：Sentence Embeddings with BERT & XLNet, https://arxiv.org/abs/1908.10084
labse：Language-agnostic BERT Sentence Embedding (LaBSE)
BERTopic：BERTopic is a topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
Top2Vec：Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.
EssayKiller_V2：基于开源GPT2.0的初代创作型人工智能 | 可扩展、可进化
Guyu：pre-training and fine-tuning framework for text generation
LM-BFF：ACL'2021: LM-BFF: Better Few-shot Fine-tuning of Language Models
AliceMind：pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD

7 NER

spert：PyTorch code for SpERT: "Span-based Entity and Relation Transformer". For a description of the model and experiments, see our paper: https://arxiv.org/abs/1909.07755 (accepted at ECAI 2020).
mrc-for-flat-nested-ner：The code for "A Unified MRC Framework for Named Entity Recognition"
AutoNER: Learning Named Entity Tagger from Domain-Specific Dictionary. 远程监督方法训练，利用无标注数据。

Inference:

LightNER: inference w. models pre-trained / trained w. any following tools, efficiently.

Training:

LD-Net: train NER models w. efficient contextualized representations.

VanillaNER: train vanilla NER models w. pre-trained embedding.

Distant Training:

AutoNER: train NER models w.o. line-by-line annotations and get competitive performance.

8 ELMo And Others

bilm-tf：Tensorflow implementation，allen ai
预训练模型
ELMoForManyLangs: 中文模型
sentence embeddings: InferSent

9 相似度匹配

cail2019：法研杯2019相似案例匹配第二名解决方案（附数据集和文档）
StarSpace：Learning embeddings for classification, retrieval and ranking.
DSSM
Chinese-sentence-similarity-task：中文问题句子相似度计算比赛及方案汇总
Question-Answering-Albert-Electra : Question Answering using Albert and Electra
simbert：a bert for retrieval and generation
haystack： Transformers at scale for question answering & search
epidemic-sentence-pair：天池疫情相似句对判定大赛线上第一名方案
deep_text_matching: implementation several deep text match (text similarly) models for keras . cdssm, arc-ii,match_pyramid, mvlstm ,esim, drcn ,bimpm, bert, albert, roberta
attention-feature-distillation：Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

10 文本分类

text_classification：all kinds of text classification models and more with deep learning
Chinese-Text-Classification-Pytorch：中文文本分类，TextCNN，TextRNN，FastText，TextRCNN，BiLSTM_Attention，DPCNN，Transformer，基于pytorch，开箱即用
TextClassificationBenchmark：A Benchmark of Text Classification in PyTorch
Bert-Chinese-Text-Classification-Pytorch：使用Bert，ERNIE，进行中文文本分类
multi-class-text-classification-cnn：Classify Kaggle Consumer Finance Complaints into 11 classes. Build the model with CNN (Convolutional Neural Network) and Word Embeddings on Tensorflow.
deep-text-classifier-mtl：tensorflow script for multi-task learning implementation of Kim's paper : Convolutional Neural Networks for Sentence Classification.
multi_task-nlp-bert： NLP multi-task learning, which includes single-sentence classification, pairwise text similarity, pairwise text classification, and relevance ranking.
TextFooler：A Model for Natural Language Attack on Text Classification and Inference【对抗攻击】
lightning-text-classification：Minimalist implementation of a BERT Sentence Classifier with PyTorch Lightning, Transformers and PyTorch-NLP.
Sequence Projection Models >> [PRADO]：A family of models that projects sequence to fixed sized features. The idea behind is to build embedding-free models that minimize the model size. Instead of using embedding table to lookup embeddings, sequence projection models computes them on the fly.
pytorch-sentiment-analysis：Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
BertGCN
detext：DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
LTP：Learned Token Pruning for Transformers
regularized-embeddings：code for the “Text classification with word embedding regularization and soft similarity measure” (Novotný et al., 2020) paper

11 Aspect Based Sentiment Analysis

ABSA-PyTorch
BERT-for-RRC-ABSA：code for our NAACL 2019 paper: "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis"
Aspect-Based-Sentiment-Analysis：Aspect-Based-Sentiment-Analysis: Transformer & Explainable ML (TensorFlow)
torchMoji：A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion
PyABSA：Open Framework for Aspect-based Sentiment Analysis based on state-of-the-art Models [NEW]

12 文本摘要

textsum：Sequence-to-Sequence with Attention Model for Text Summarization.
sumy：Module for automatic summarization of text documents and HTML pages.
pointer_summarizer：pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"
pointer-generator：Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
transformer-pointer-generator： Transformer and Pointer-generator
BertSum：Code for paper Fine-tune BERT for Extractive Summarization
hiersumm：Code for paper Hierarchical Transformers for Multi-Document Summarization in ACL2019
rouge
pegasus：Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised objective Gap Sentences Generation (GSG) to train a transformer encoder-decoder model.
awesome-text-summarization：The guide to tackle with the Text Summarization
SPACES：端到端的长本文摘要模型（法研杯2020司法摘要赛道）
GPT2-NewsTitle：中文GPT2新闻标题生成项目。
Texygen：A text generation benchmarking platform
summarize-from-feedback：基于强化学习的SOTA

13 seq2seq

nlg-eval: 评测指标
OpenSeq2Seq：Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Magenta：involves developing new deep learning and reinforcement learning algorithms for generating songs, images, drawings, and other materials.
deepnmt：This PyTorch package implements Very Deep Transformers for Neural Machine Translation
EmbeddinglessNMT：The implementation of "Neural Machine Translation without Embeddings"
seq2seq-couplet：Play couplet with seq2seq model.
TransCoder：Pytorch original implementation of TransCoder in Unsupervised Translation of Programming Languages
Sentence-VAE：PyTorch Re-Implementation of "Generating Sentences from a Continuous Space" by Bowman et al 2015 https://arxiv.org/abs/1511.06349
Deep Generative Models for Natural Language Processing Papers

14 QA

qa_match：A simple effective ToolKit for short text matching
QA-Survey：对问答系统的调研。
QueryGeneration：Conversational Standard Meta Language
acl2020-openqa-tutorial：ACL2020 Tutorial: Open-Domain Question Answering
ccf_2020_qa_match: ccf 2020 qa match competition top1
AnyQ(ANswer Your Questions)：开源项目主要包含面向FAQ集合的问答系统框架、文本语义匹配工具SimNet

15 ModelZoo

Gathers machine learning and Tensorflow deep learning models for NLP problems
MatchZoo 是一个通用的文本匹配工具包: deep text matching models
awesome-sentence-embedding
Awesome-Chinese-NLP：中文自然语言处理相关资料
awesome-nlp：研究进展、guides、工具包等
中文自然语言处理 Chinese NLP：各种任务sota baseline
funNLP：相关资源合集

16 开源包

nematus：Open-Source Neural Machine Translation in Tensorflow
Gluon-NLP
SnowNLP：Python library for processing Chinese text
gensim ：Topic Modelling in Python
PyText：A natural language modeling framework based on PyTorch，is a deep-learning based NLP modeling framework built on PyTorch.
allennlp：An open-source NLP research library, built on PyTorch
结巴中文分词
lda2vec: Tools for interpreting natural language
复旦 fastNLP：: A Modularized and Extensible NLP Framework. Currently still in incubation.
fast text： representation and classification.
autotuning for fastText
HanLP
ltp：ltp 4.0 版，比3.x版本安装使用方便多了
pkuseg多领域中文分词工具：pkuseg简单易用，支持细分领域分词，有效提升了分词准确度
An open-source neural machine translation toolkit 清华
Neural Modules: a toolkit for conversational AI
stanza ： Official Stanford NLP Python Library for Many Human Languages Doc
StarSpace：Learning embeddings for classification, retrieval and ranking.
BigARTM：topic model
tkitMarker_bert：使用bert微调提取实体，描述
fastHan：fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具，像spacy一样调用方便。
Jiagu：Jiagu深度学习自然语言处理工具知识图谱关系抽取中文分词词性标注命名实体识别情感分析新词发现关键词文本摘要文本聚类
KILT：A Benchmark for Knowledge Intensive Language Tasks。
AutoPhrase：Automated Phrase Mining from Massive Text Corpora
Kashgari-doc-zh Kashgari：Kashgari 是一个极简且强大的 NLP 框架，可用于文本分类和标注的学习，研究及部署上线
Senta：Baidu's open-source Sentiment Analysis System.
DDParser：百度开源的依存句法分析系统
FastBERT
OpenMatch：An Open-Source Package for Information Retrieval.
robustness-gym：Evaluation Toolkit for NLP
elasticsearch-py：Official Python low-level client for Elasticsearch
py-googletrans：Free and Unlimited Google translate API for Python.
UDA_pytorch：UDA(Unsupervised Data Augmentation) implemented by pytorch
PaddleNLP：An NLP library with Awesome pre-trained Transformer models
mars：a tensor-based unified framework for large-scale data computation which scales Numpy, pandas, Scikit-learn and Python functions.
knlp：类似 snownlp 和 textblob，调用方便，提供基础算法的训练和推理的脚本，各种nlp任务的评估方法以及评估数据集，提供深度学习，面向中文开发，且功能很基础，适合于二次改造。
skweak: A software toolkit for weak supervision applied to NLP tasks
pytorch-metric-learning: The easiest way to use deep metric learning in your application. 可直接使用的 NTXENT loss (InfoNCE) ，SupContrast loss等对比学习损失。
dice_loss_for_NLP：ACL2020 paper Dice Loss for Data-imbalanced NLP Tasks
TextBlob：Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

17 其他model

show-attend-and-tell
Retrieval-Based Conversational Model in Tensorflow
sru：Training RNNs as Fast as CNNs
Self_Explaining_Structures_Improve_NLP_Models: BERT输出特征交叉的一种尝试，自定义模型层维度不要太大。依赖调参，效果说好一点就好一点吧。

18 关系抽取

Snowball：Snowball: Extracting Relations from Large Plain-Text Collections
OpenNRE： relation extraction models.
MRC4ERE_plus：Implementation for Paper "Asking Effective and Diverse Questions: A Machine Reading Comprehension based Framework for Joint Entity-Relation Extraction"
AlpacaTag：AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
Distant-Supervised-Chinese-Relation-Extraction：基于远监督的中文关系抽取
Entity-Relation-Extraction：Entity and Relation Extraction Based on TensorFlow and BERT.
Information-Extraction-Chinese：Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT
pytorch-relation-extraction：distant supervised relation extraction models: PCNN MIL (Zeng 2015), PCNN+ATT(Lin 2016).
USC-DS-RelationExtraction：Distantly Supervised Relation Extraction
open-entity-relation-extraction：Knowledge triples extraction and knowledge base construction based on dependency syntax for open domain text.
BERT-Relation-Extraction：PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper
PersonRelationKnowledgeGraph：bootstrapping方法的人物关系抽取,基于知识图谱的知识问答等应用
OpenKE：An Open-Source Package for Knowledge Embedding (KE)
deepke：基于 Pytorch 的深度学习中文关系抽取处理套件
DeepIE：DeepIE: Deep Learning for Information Extraction
CasRel：A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
CasRel-pytorch-reimplement
two-are-better-than-one：Code associated with the paper “Two are Better Than One: Joint Entity and Relation Extraction with Table-Sequence Encoders”, at EMNLP 2020
TPlinker-joint-extraction：TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking
PURE：NAACL'2021: A Frustratingly Easy Approach for Entity and Relation Extraction
pke：Python Keyphrase Extraction module

19 蒸馏

TextBrewer：A PyTorch-based knowledge distillation toolkit for natural language processing
KD_Lib：A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
RepDistiller：[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
KD_SRRL：Paper. Knowledge distillation via softmax regression representation learning

20 对话

DeepPavlov：An open source library for deep learning end-to-end dialog systems and chatbots.
ConvLab-2：ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
rasa-chatbot：Sample chatbot with rasa stack
nezha_gpt_dialog
DialoGPT：Large-scale pretraining for dialogue
CDial-GPT：A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
mirai：高效率 QQ 机器人框架 / High-performance bot framework for Tencent QQ
unit-dmkit: DMKit作为UNIT的开源对话管理模块，可以无缝对接UNIT的理解能力，并赋予开发者多状态的复杂对话流程管理能力，还可以低成本对接外部知识库，迅速丰富话术信息量
chat：基于自然语言理解与机器学习的聊天机器人，支持多用户并发及自定义多轮对话。对知识图谱以及 KBQA 感兴趣，想从0开始构建自己的知识图谱
SMP2018：SMP2018中文人机对话技术评测（ECDT）
GPT2-chitchat：GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI**)
百度对话系统
Rasa：开源机器学习框架，用于自动化基于文本和语音的对话
ParlAI：sharing, training and evaluating dialogue models across many tasks.
Task-Oriented-Dialogue-Research-Progress-Survey
MetaDialog：Platform for few-shot natural language processing: Text Classification, Sequene Labeling.
ChatGLM-6B：开源双语对话语言模型

21 意图与槽位填充

FewShotMultiLabel：AAAI2021 paper: Few-Shot Learning for Multi-label Intent Detection.
FewShotTagging：ACL2020 paper: Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network

22 指代消解

hobbs：Implementation of Hobbs' algorithm for coreference resolution in python

23 主题

microsoft / LightLDA

24 自动机

automata：A Python library for simulating finite automata, pushdown automata, and Turing machines

25 阅读理解

SDNet
SogouMRCToolkit： fast and efficient development of modern machine comprehension models, including both published models and original prototypes

26 数据增强

Cutoff：Cutoff data augmentation approach for NLP

27 Prompt

PromptPapers：Must-read papers on prompt-based tuning for pre-trained language models.
autoprompt：AutoPrompt: Automatic Prompt Construction for Masked Language Models.
P-tuning：CPM(Chinese PL)模型的 fine-tune 代码仓库，可以用于模型 fine-tune 的多机多卡训练/测试。
P-tuning-v2：ACL 2022 paper "P-tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks"
OpenPrompt：An Open-Source Framework for Prompt-Learning
iPrompt：Controllable Generation from Pre-trained Language Models via Inverse Prompting

28 Adaptor

adapter-transformers：Huggingface Transformers + Adapters = ❤️

29 Book and Course

Natural Language Processing with PyTorch
Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"
MTBook：《机器翻译：统计建模与深度学习方法》

30 可视化

Text Visualization Browser

31 最新研究进展

track the progress in Natural Language Processing (NLP)
Repository to show how NLP can tacke real problem.
Leaderboards-for-Multi-Turn-Response-Selection：provide the reader with a quick overview of benchmark datasets and the state-of-the-art studies on this task, which serves as a stepping stone for further research.
awesome-papers：Papers & presentation materials from Hugging Face's internal science day

ChatGPT

生成式AI

generative-ai-for-beginners：Get Started Building with Generative AI

相似性

milvus：An open source vector similarity search engine -- Linux
faiss：A library for efficient similarity search and clustering of dense vectors. -- Linux
annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
Scann：ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. lecture

语音

ASRT_SpeechRecognition：A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
speechT: An opensource speech-to-text software written in tensorflow
TensorflowTTS：TensorflowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2
DeepSpeechRecognition：A Chinese Deep Speech Recognition System 包括基于深度学习的声学模型和基于深度学习的语言模型
audio-pretrained-model：A collection of Audio and Speech pre-trained models.
pyannote-audio：Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Kersa-Speaker-Recognition：基于Kersa实现的声纹识别模型
Transformer-TTS：Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"
kaldi：kaldi-asr/kaldi is the official location of the Kaldi project.
espnet：End-to-End Speech Processing Toolkit
MockingBird: AI拟声: 5秒内克隆您的声音并生成任意语音内容
Real-Time-Voice-Cloning: Clone a voice in 5 seconds to generate arbitrary speech in real-time
whisper：Speech Recognition via Large-Scale Weak Supervision
bark: Text-Prompted Generative Audio Model

GAN

GAN-ZOO：A list of all named GANs!
The classical paper list with code about generative adversarial nets
Curated list of awesome GAN applications and demo
StarGAN
iGAN: Interactive Image Generation via Generative Adversarial Networks
CycleGAN and pix2pix in PyTorch image-to-image translation
[U-GAT-IT用小姐姐自拍，生成二次元萌妹子，神情高度还原，反过来也可以](https://github.com/znxlwm/UGATIT-pytorch https://github.com/taki0112/UGATIT)
SeqGAN： https://github.com/suragnair/seqGAN https://github.com/ChenChengKuan/SeqGAN_tensorflow
deepgenerativemodels / notes
sngan_projection：GANs with spectral normalization and projection discriminator
gan：Tooling for GANs in TensorFlow
AnimeGAN： AnimeGAN for fast photo animation !
BigGAN
first-order-model：图片动画化
stargan-v2：StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
UGATIT-pytorch：风格转换. Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
【book】O'Reilly book 'Generative Deep Learning'
SeqGAN seqGAN-Simplified
super-resolution: Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

CV

CV-pretrained-model：A collection of computer vision pre-trained models.
腾讯优图开源项目
Dlib：making real world machine learning and data analysis applications in C++
mediapipe: Cross-platform, customizable ML solutions for live and streaming media. Google.
Convolution arithmetic卷积算法可视化解释
cnn-explainer：可视化cnn训练学习
pytorch-image-models：PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more. 【NEW】

图像识别与分类

DBoW2：Enhanced hierarchical bag-of-word library for C++
face_classification: Real-time face detection and emotion/gender classification
人脸识别：The world's simplest facial recognition api for Python and the command line
insightface: State-of-the-art 2D and 3D Face Analysis Project.
TF_FLAME：Example Tensorflow code for the FLAME face model
DeepFaceLab_Colab：https://www.deepfaker.xyz -- NOTE：With colab you can use tesla P100 for free. Of course there are some restrictions
EasyOCR：Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
libfacedetection：: face detection in images. The face detection speed can reach 1000FPS.
PyMatting： A Python Library for Alpha Matting. 抠图
rembg：Rembg is a tool to remove images background. 抠图
TransFG：A Transformer Architecture for Fine-grained Recognition
bottleneck-transformer-pytorch：SotA visual recognition model with convolution + attention that outperforms EfficientNet and DeiT in terms of performance-computes trade-off, in Pytorch
deep-learning-for-image-processing：deep learning for image processing including classification and object-detection etc.
DEKR：This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)
mmpose：OpenMMLab Pose Estimation Toolbox and Benchmark.
VNN：多种图片效果转换工具，高性能、轻量级神经网络部署框架。
chineseocr_lite：超轻量级中文ocr，支持竖排文字识别, 支持ncnn、mnn、tnn推理。
HyperLPR: 基于深度学习高性能中文车牌识别
fawkes: 识别合成图像
mae： Masked Autoencoders Are Scalable Vision Learners
ConvNeXt ：A ConvNet for the 2020s. CVPR 2022.
ResNeSt: significantly boosts the performance of downstream models such as Mask R-CNN, Cascade R-CNN and DeepLabV3.
Res2Net: Multi-scale Backbone Architecture
CSWin-Transformer：Vision Transformer Backbone with Cross-Shaped, CVPR 2022
deepface：A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python

opencv

learnopencv: Learn OpenCV : C++ and Python Examples

openvino

openvino_tensorflow: OpenVINO™ integration with TensorFlow. 简短介绍

目标检测

Recent Advances in Deep Learning for Object Detection
pytorch-YOLOv4：PyTorch ,ONNX and TensorRT implementation of YOLOv4
tensorflow-yolov4-tflite：YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
yolov5：YOLOv5 in PyTorch > ONNX > CoreML > iOS
deep_learning_object_detection：A paper list of object detection using deep learning.
deepdetect：Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
mmdetection：OpenMMLab Detection Toolbox and Benchmark
FaceBoxes.PyTorch：A PyTorch Implementation of FaceBoxes
openpose：OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
Surface-Defect-Detection：open source dataset and important critical papers in the field of surface defect research
ViT-pytorch: Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
efficientnet-pytorch: PyTorch implementation of "EfficientNet", ICML 2019
efficientnet: Implementation of EfficientNet model. Keras and TensorFlow Keras.
FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
image-segmentation-keras：Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras
YOLOF：You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2
Ultra-Fast-Lane-Detection：Ultra Fast Structure-aware Deep Lane Detection (ECCV 2020)
LSPS：Source code for "3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space"
nanodet: ⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
U-2-Net：paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
segment-anything: running inference with the SegmentAnything Model (SAM)
AnimatedDrawings

医疗

InnerEye-DeepLearning: Medical Imaging Deep Learning library to train and deploy models on Azure Machine Learning and Azure Stack. intro

多模态

microsoft/psi：an open, extensible framework for development and research of multimodal, integrative-AI systems. 【C#】
ClipBERT：an efficient framework for end-to-end learning for image-text and video-text tasks.
stable-diffusion：A latent text-to-image diffusion model.
motion-diffusion-model："Human Motion Diffusion Model"
Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
stable-diffusion-webui：web UI

图像高清化/风格转换/老化照片处理

pulse：Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
FastPhotoStyle：分割不同部分的转换，Nice output
DeOldify：A Deep Learning based project for colorizing and restoring old images
图片修复
Code and data for paper "Deep Photo Style Transfer"
TensorFlow CNN for fast style transfer
ALAE: Adversarial Latent Autoencoders
deep-daze：Simple command line tool for text to image generation using OpenAI's CLIP and Siren
AnimeGANv2：The improved version of AnimeGAN. Landscape photos/videos to anime
Real-ESRGAN：Practical Algorithms for General Image/Video Restoration.
avatarify-python：摄像头实时风格转换
tiler： 👷 Build images with images. 非深度学习
pixray：neural image generation. 生成pixel风格图像
text2art：AI-powered Text-to-Art Generator - Text2Art.com
latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models
dalle-mini: Generate images from a text prompt
paper2gui: 简单方便的使用前沿人工智能技术
style2paints：sketch + style = paints 🎨

数据增强

fast-autoaugment：Official Implementation of 'Fast AutoAugment' in PyTorch.
zao-：AI技术换脸源码
AutoAugment：Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow
albumentations: Fast image augmentation library and easy to use wrapper around other libraries
imgaug：Image augmentation for machine learning experiments.
AugLy：A data augmentations library for audio, image, text, and video.
ttach：Test Time Augmentation with PyTorch

Reinforcement Learning

算法、讲义、练习：Implementation of Reinforcement Learning Algorithms
RLexample：basic examples of playing with RL
DeepRL-Tutorials：Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
ierg5350-assignment：assignments of our reinforcement learning (RL) course.
TD3：Author's PyTorch implementation of TD3 for OpenAI gym tasks. Nice code style and quality👍
baby-a3c: A high-performance Atari A3C agent in 180 lines of PyTorch
pytorch-a2c-ppo-acktr-gail: PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
NeuronDance / DeepRL: Deep Reinforcement Learning Lab
awesome-monte-carlo-tree-search-papers
advanced-deep-learning-and-reinforcement-learning-deepmind：UCL & DeepMind | YouTube videos 👉 https://www.youtube.com/playlist?list…
AlphaZero_Gomoku：An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
reinforcement-learning-an-introduction：强化学习导论配套代码库
PARL：PARL A high-performance distributed training framework for Reinforcement Learning
trfl：TensorFlow Reinforcement Learning
tianshou：An elegant PyTorch deep reinforcement learning library.

知识图谱

农业知识图谱(AgriKG)
KGQA-Based-On-medicine
KEQA_WSDM19
transE
KB2E：thunlp
tianchi_nl2sql : 首届中文NL2SQL挑战赛决赛第3名方案+代码
CCKS 2019 中文知识图谱问答数据集
knowledge-graph： a QA Demo based on KG! use scrapy and jena.
ONEPIECE-KG： a knowledge graph project for ONEPIECE /《海贼王》知识图谱
K-BERT：Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph"
scikit-kge：Python library to compute knowledge graph embeddings
Financial-Knowledge-Graphs：小型金融知识图谱构建流程
KG-demo-for-movie：从无到有构建一个电影知识图谱，并基于该KG，开发一个简易的KBQA程序
pykg2vec：Python library for knowledge graph embedding and representation learning.
text_to_knowledge: 解语（Text to Knowledge）是首个覆盖中文全词类的知识库（百科知识树）及知识标注框架，拥有可描述所有中文词汇的词类体系、中文知识标注工具集，以及更适用于中文挖掘任务的预训练语言模型。paddlenlp子项目，没有开源。
graph-data-science：Neo4j Graph Data Science library of graph algorithms.

深度贝叶斯/概率

Deep universal probabilistic programming with Python and PyTorch](https://github.com/pyro-ppl/pyro)
Python library for probabilistic modeling, inference, and criticism
A Library for Bayesian Deep Learning, Generative Models, Based on Tensorflow

Capsule Net

自动驾驶

机器人

AtsushiSakai / PythonRobotics

Contrastive Learning

PyContrast：PyTorch implementation of Contrastive Learning methods; List of awesome-contrastive-learning papers
SimCSE：SimCSE: Simple Contrastive Learning of Sentence Embeddings
ConSERT: ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
pytorch-metric-learning: The easiest way to use deep metric learning in your application. 可直接使用的 NTXENT loss (InfoNCE) ，SupContrast loss等对比学习损失。

Adversarial Attack

adversarial-robustness-toolbox
foolbox：fool neural networks
cleverhans：constructing attacks, building defenses, and benchmarking both
FreeLB：Adversarial Training for Natural Language Understanding
DefenseByAttack: Code for paper "One Man's Trash is Another Man's Treasure"

Multi-Task Learning

fudan_mtl_reviews：TensorFlow implementation of the paper Adversarial Multi-task Learning for Text Classification
https://decanlp.com/： The Natural Language Decathlon (decaNLP) is a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. decaNLP

联邦学习

FedML-AI/FedML：(PyTorch > 1.0) A Research-Oriented Federated Learning Library. Supporting distributed computing, mobile/IoT on-device training, and standalone simulation. Intro

图网络

dgl：Python package built to ease deep learning on graph, on top of existing DL frameworks.
AutoGL：An autoML framework & toolkit for machine learning on graphs.
graph_nets：Build Graph Nets in Tensorflow
A collection of important graph(Code)
Must-read papers on graph neural networks (GNN)
microsoft/tf-gnn-samples
spektral：Graph Neural Networks with Keras and Tensorflow 2.
littleballoffur：A NetworkX extension library for graph subsampling.
awesome-gcn： resources for graph convolutional networks （图卷积神经网络相关资源）
pytorch_geometric：Geometric Deep Learning Extension Library for PyTorch https://pytorch-geometric.readthedocs…
PytorchGeometricTutorial：Pytorch Geometric Tutorials

优化算法

RAdam

框架实践

tuning_playbook： A playbook for systematically maximizing the performance of deep learning models.

DeepLearningExamples：Deep Learning Examples
jax：Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU
PyCandle：A numpy and cpu based neural network tool. For those who intend to learn more about the details of how a neural network works.
【**】einops：Deep learning operations reinvented (for pytorch, tensorflow, chainer, gluon and others)。还在为tensor维度变化操作的语法发愁吗？试试这个说人话的package
tinygrad：You like pytorch? You like micrograd? You love tinygrad! ❤️
MatrixSlow：A simple deep learning framework in pure python for purpose of learning in DL
best-of-ml-python：🏆 A ranked list of awesome machine learning Python libraries.
pytorch-loss：label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax ...
pytorch-optimizer：torch-optimizer -- collection of optimizers for Pytorch
prefect: The easiest way to coordinate your dataflow
KuiperInfer: 一个推理库的实现, A DIY deep learning inference framework.

Tensorflow

Effective TensorFlow
简明TF2 tensorflow2中文教程
eat_tensorflow2_in_30_days
deeplearning-models: various deep learning architectures, models, and tips
TensorFlow实战书codes
Deep Learning with Python Keras
Hands-on Machine Learning with Scikit-Learn and TensorFlow
Neural Machine Translation (seq2seq) Tutorial
TFLearn: Deep learning library featuring a higher-level API for TensorFlow.
Tensor2Tensor：deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
Sonnet： is a library built on top of TensorFlow for building complex neural networks.
KDD2019 Deep Learning for NLP with Tensorflow hands-on
TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
简单粗暴 TensorFlow 2.0
TensorFlow for Deep Learning Research. 课件
tensorpack：A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
larq：An Open-Source Library for Training Binarized Neural Networks
addons：Useful extra functionality for TensorFlow 2.x maintained by SIG-addons.
keras-cosine-annealing

C++

hello_tf_c_api：Neural Network TensorFlow C API
tensorflow_cc：Build and install TensorFlow C++ API library.
tiny-dnn：header only, dependency-free deep learning framework in C++14

Pytorch

Fairseq(-py) is a sequence modeling toolkit
A practical approach to machine learning pytorch
pix2pixHD
实战Deep Architectures PyTorch：ppt
动手学深度学习Pytorch版
A very simple framework for state-of-the-art Natural Language Processing (NLP)
pytorch-lightning：pytorch + TPU
Awesome-pytorch-list：A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
DeepNLP-models-Pytorch：Pytorch implementations of various Deep NLP models in cs-224n(Standford Univ)
pytorch-Deep-Learning: Deep Learning (with PyTorch)
serve：Model Serving on PyTorch
pytorch-seq2seq：Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
pycandle：PyCandle is a lightweight library for pytorch that makes running experiments easy, structured, repeatable and avoids boilerplate code.
AI-Art：PyTorch implementation of Neural Style Transfer, Pix2Pix, CycleGAN, and Deep Dream!
entmax：The entmax mapping and its loss, a family of sparse softmax alternatives.
Adabelief-Optimizer：NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
pytorch-optimizer：torch-optimizer -- collection of optimizers for Pytorch
examples: A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
computervision-recipes：Best Practices, code samples, and documentation for Computer Vision.
nlp-recipes：Natural Language Processing Best Practices & Examples
pymde：Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network
mmf：MMF is a modular framework for vision and language multimodal research from Facebook AI Research.
examples: A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
pytorch-image-models: PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
pytorch-cosine-annealing-with-warmup
fastmoe：A fast Mixture of Experts（MoE） impl for PyTorch
annotated_deep_learning_paper_implementations
External-Attention-pytorch：🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.
torch-toolbox：ToolBox to make using Pytorch much easier. Mainly for CV.
tianshou：An elegant PyTorch deep reinforcement learning library.

MxNet

动手学深度学习
autogluon：AutoGluon: AutoML for Text, Image, and Tabular Data

Spark

Ray

ray：A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library. 【actor模式】

Lasagne

Lasagne：Lasagne is a lightweight library to build and train neural networks in Theano.

模型优化

NVIDIA TensorRT doc GitHub
hyperopt / hyperopt：超参数调整
hyperparameter_hunter：Automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests.
microsoft / DeepSpeed：library that makes distributed training easy, efficient, and effective.
- TDS: A plug-in of Microsoft DeepSpeed to fix the bug of DeepSpeed pipeline
apex：A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
model-optimization：A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
keras-tuner：Hyperparameter tuning for humans
optuna：A hyperparameter optimization framework
lightseq：LightSeq: A High Performance Library for Sequence Processing and Generation
nni：An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
AutoGluon：AutoML for Text, Image, and Tabular Data.
BMInf：Efficient Inference for Big Models
InfMoE: Inference framework for MoE(Mixture of Experts) layers based on TensorRT with Python binding
fastmoe: A fast MoE impl for PyTorch

模型训练部署

机器学习系统设计

部署

BentoML：Model Serving Made Easy
Turi Create simplifies the development of custom machine learning models
cortexlabs / cortex：模型部署【相关项目：cortex: A horizontally scalable, highly available, multi-tenant, long term Prometheus. 】
bert-classification-tf-serving
Deep-Learning-in-Production：deploying deep learning-based models in production.
***ray：A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library. Serve Video Documents

process to process 跳过调度，增加性能
fiber：Distributed Computing for AI Made Simple
model_deployment：A collection of model deployment library and technique.
jina：An easier way to build **neural search applications ** in the cloud
plaidml：PlaidML is a framework for making deep learning work everywhere.
streamlit：Streamlit — The fastest way to build data apps in Python
kubeflow: Machine Learning Toolkit for Kubernetes
waitress：Waitress - A WSGI server for Python 2 and 3
mlflow: Open source platform for the machine learning lifecycle
zenml：Bring Zen to your ML with reproducible pipelines
cs329s-ml-deployment-tutorial：Code and files to go along with CS329s machine learning model deployment tutorial.
nboost：deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)
object_detector_app：Real-Time Object Recognition App with Tensorflow and OpenCV
uwsgi-nginx-flask-docker：Docker image with uWSGI and Nginx for Flask applications in Python running in a single container. Optionally with Alpine Linux.
Modin: Speed up your Pandas workflows by changing a single line of code
nameko：Python framework for building microservices
metaflow：Build and manage real-life data science projects with ease.
NLPStreamlit：简易部署 Sentiment Analysis, Named Entity Recognition (NER), and Text Summarization 等网页应用。
TinyNeuralNetwork：compression framework.
nonebot2: 跨平台 Python 异步聊天机器人框架
gradio: 用 Python 为模型创建演示界面。Create UIs for your machine learning model in Python in 3 minutes.
cog：Containers for machine learning
mercury: Build Web Apps in Jupyter Notebook with Python only

机器学习编译

d2l-tvm：Dive into Deep Learning Compiler

CUDA

YOLO_TRT_SIM：高效部署：YOLO X, V3, V4, V5, V6, V7, V8, EdgeYOLO TRT推理 ,前后处理均由CUDA核函数实现 CPP/CUDA

训练

ray：A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library
ml-agents ：：Unity Machine Learning Agents Toolkit 训练游戏AI
BytePS：A high performance and general PS framework for distributed training
Horovod：The goal of Horovod is to make distributed Deep Learning fast and easy to use
cml：Continuous Machine Learning | CI/CD for ML，结果组织成网页分析
fairscale：PyTorch extensions for high performance and large scale training.
replicate：Version control for machine learning
orchest：Orchest is a tool for creating data science pipelines.
trains：Auto-Magical Experiment Manager & Version Control for AI
cleverhans：An adversarial example library for constructing attacks, building defenses, and benchmarking both
docker-python：Kaggle Python docker image
distribuuuu：The pure and clear PyTorch Distributed Training Framework.
ColossalAI：A Unified Deep Learning System for Large-Scale Parallel Training
maggot：A lightweight python library that helps to keep track of numerical experiments
rigl：End-to-end training of sparse deep neural networks with little-to-no performance loss. "Making All Tickets Winners"
skweak: A software toolkit for weak supervision applied to NLP tasks
AugLy：A data augmentations library for audio, image, text, and video.
pytorch-balanced-sampler: under/over sample according to a chosen parameter alpha, in order to create a balanced training distribution.
pytorch-balanced-batch

Transfer Learning

Hands-On Transfer Learning with Python

多任务

multi-task-learning-example
mt-dnn：Multi-Task Deep Neural Networks for Natural Language Understanding

Awesome

awesome: 全，Awesome lists about all kinds of interesting topics
Awesome-win: An awesome & curated list of best applications and tools for Windows.
awesome-tensorflow：TensorFlow - A curated list of dedicated resources
awesome-deep-learning：A curated list of awesome Deep Learning tutorials, projects and communities.
awesome-nlp：A curated list of resources dedicated to Natural Language Processing (NLP)
awesome-docker：A curated list of Docker resources and projects
Awesome-pytorch-list：A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
Awesome-Chinese-NLP：中文自然语言处理相关资料
open_model_zoo：Pre-trained Deep Learning models and demos (high quality and extremely fast)
https://modelzoo.co/
awesome-bots：The most awesome list about bots
Awesome-Chatbot：Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
awesome-mlops: A curated list of references for MLOps, 机器学习开发周期教程、视频、博客
google-research / language：Shared repository for open-sourced projects from the Google AI Language team.
awesome-relation-extraction：A curated list of awesome resources dedicated to Relation Extraction
awesome-grounding：A curated list of research papers in grounding.
awesome-automl-papers：automated machine learning papers, articles, tutorials, slides and projects

会议资源

KDD2019 Hands-on Tutorials

项目idea

Machine Learning, NLP, Vision, Recommender Systems Project Ideas
数据竞赛Top解决方案开源整理
Voice Conversion with Non-Parallel Data
语音转文字wave-net
A TensorFlow implementation of Baidu's DeepSpeech architecture
industry-machine-learning：A curated list of applied machine learning and data science notebooks and libraries across different industries.
news-search-engine：新闻搜索引擎

3 机器学习

awesome-mlops: A curated list of references for MLOps, 机器学习开发周期教程、视频、博客
Elements-of-Mathematics：从加减乘除到机器学习
The-Art-of-Linear-Algebra：Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

开源工具

算法包

H2O documentation：H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.
vowpal_wabbit：a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
XGBoost repository
LightGBM repository
cvxpy：A Python-embedded modeling language for convex optimization problems.
tpot：A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
production-tools：演示如何为数据科学项目设置工具的基本存储库，这些工具将帮助您编写更高质量的代码。
scikit-multiflow：A machine learning package for streaming data in Python. 流式数据输入进行训练
igel：a delightful machine learning tool that allows to train, test and use models without writing code.
creme：Online machine learning in Python
statsmodels：statistical modeling and econometrics in Python
xlearn：High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI
GPy：Gaussian processes framework in python
mlens：ML-Ensemble – high performance ensemble learning
Kats：[时序]analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis
modin: 比pandas更快，且接口基本相同

聚类

spherecluster：Clustering routines for the unit sphere blog
brown-clustering：Brown clustering in Python. 词聚类
lsystem_optimization：Socially isolating through obsessive micro-optimization.

特征

cleanlab：Find label errors in datasets, weak supervision, and learning with noisy labels.
boruta_py：Python implementations of the Boruta all-relevant feature selection method.
*dabl：Data Analysis Baseline Library ！Document
mglearn：绘图函数包
umap：Uniform Manifold Approximation and Projection，like t-sne
geomstats：About Computations and statistics on manifolds with geometric structures.
FEATHER
Mito：像Excel一些样操作 pandas dataframe
Pandas Profiling：一键分析中小型 pandas dataframe
Lux：自动推荐探索性数据分析的图表选择。
pycaret：An open-source, low-code machine learning library in Python. 快速搭建基础模型，筛选特征

科学计算

scikit-geometry：Scientific Python Geometric Algorithms Library
f2py import Fortran code in Python
awkward-array：Manipulate arrays of complex data structures as easily as Numpy. Example
boost-histogram：Python bindings for the C++14 Boost::Histogram library
cusignal：cuSignal - RAPIDS Signal Processing Library
scorep_binding_python：Allows tracing of python code using Score-P
scikit-opt: 传统优化算法等 Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP

GPU加速

jax：Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more（XLA based）
numba：比jax更复杂NumPy aware dynamic Python compiler using LLVM

算法实现

Minimal and clean examples of machine learning algorithms implementations
ML-From-Scratch：Machine Learning From Scratch.
Machine learning, in numpy 全numpy实现
统计学习方法
统计学习方法：An-Introduction-to-Statistical-Learning：官方版本
概率模型：变分推断、GAN、MC等等 Python library for probabilistic modeling, inference, and criticism
PRML algorithms implemented in Python
finite-state toolkit--FST
Machine learning: a probabilistic perspective
TGBoost
Arbitrary order factorization machines：TensorFlow implementation of an arbitrary order Factorization Machine
SpectralNet：Deep network that performs spectral clustering 【聚类】
Deep universal probabilistic programming with Python and PyTorch
An-Introduction-to-Statistical-Learning：This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.
hmmlearn：Hidden Markov Models in Python, with scikit-learn like API

安全机器学习

OpenMined / PySyft：A library for encrypted, privacy preserving machine learning
FATE: An Industrial Level Federated Learning Framework

AutoML

AlphaPy：Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
nni：An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
AutoGluon：AutoML for Text, Image, and Tabular Data.

可解释机器学习

interpretable-ml-book
shapash：🔅 Shapash makes Machine Learning models transparent and understandable by everyone

实用资料/调参工具

Machine Learning Cheatsheet
hyperparameter_hunter：Automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests.
hyperopt / hyperopt：超参数调整

4 比赛方案

TNSCUI2020-Seg-Rank1st：图像分割
CCKS2019_EventEntityExtraction_Rank5：SEBERTNets：一种面向金融领域的事件主体抽取方法
lic2019-dureader2.0-rank2：Rank2 solution (no-BERT) for 2019 Language and Intelligence Challenge - DuReader2.0 Machine Reading Comprehension.
Tencent2019_Finals_Rank1st：2019腾讯广告算法大赛完整代码（冠军）
Tencent2020_Rank1st：The code for 2020 Tencent College Algorithm Contest, and the online result ranks 1st.
riiid-acp-pub 3rd：Riiid! Answer Correctness Predction 3rd place solution. 复现需要较好的机器配置。
quest_qa_labeling1st：Google QUEST Q&A Labeling. Improving automated understanding of complex question answer content
gaic_track3_pair_sim：全球人工智能技术创新大赛-短文本语义匹配--冠军方案
ccks_baidu_entity_link：ccks baidu entity link 实体链接第一名
daguancup_-5th：第五届“达观杯” 基于大规模预训练模型的风险事件标签识别比赛，初赛A榜第四，最终排名第六。只用了单模nezha。

比赛信息

MLCompetitionHub：机器学习竞赛信息

5 开源工具

ChatGPT: 🔮 ChatGPT Desktop Application

ChatGPT - Poe：在线免费chatgpt

字体

可视化

Tool for visualizing attention in the Transformer model
tqdm
Visualizations for machine learning datasets
manim：Animation engine for explanatory math videos
diagrams： Diagram as Code for prototyping cloud system architectures，用代码画架构图
dl-visualization：This is a repository containing the source code for the animations to the series "Visualizing Deep Learning" on the YouTube channel vcubingx.
weibo-analysis-and-visualization

系统工具

tldr：Simplified and community-driven man pages
ShellCheck, a static analysis tool for shell scripts
[Git] Git的奇技淫巧
pure-bash-bible：bash脚本使用指南
[C++] Windows Terminal
awesome window manager
memory-profiler pip install memory-profiler
code-server：VS Code in the browser https://coder.com
gen_tags.vim：ctags增强
bat：bat supports syntax highlighting for a large number of programming and markup languages
records: SQL for Humans in Python. Database support includes RedShift, Postgres, MySQL, SQLite, Oracle, and MS-SQL (drivers not included).
miasm：Reverse engineering framework in Python
termpair: 使用浏览器，远程连接服务器terminal。
asynctasks.vim：Modern Task System for Project Building, Testing and Deploying !!
TencentOS-tiny：物联网终端操作系统
nginx-tutorial：Nginx 极简教程
cockpit: a web-based graphical interface for servers.
changedetection.io: self-hosted free open source website change detection, monitor and notification service.
Ripes: A graphical processor simulator and assembly editor for the RISC-V ISA
gitui: Blazing 💥 fast terminal-ui for git
ecapture: capture SSL/TLS text content without CA cert using eBPF.
Bottles：Run Windows software and games on Linux
ansible： Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain.

小项目

一个基于离线唤醒，自然语言理解和情感分析的开源自然交互系统
微信助手：1.每日定时给好友发送定制消息
latest research results by crawling arxiv papers and summarizing abstracts.
nider：Python package to add text to images, textures and different backgrounds
docusaurus：Easy to maintain open source documentation websites. https://docusaurus.io
Synonyms：中文近义词：聊天机器人，智能问答工具包
tushare: TuShare is a utility for crawling historical data of China stocks
h5-Dooring：简单方便、专业可靠、无限可能的H5/PC页面制作解决方案.
pytheory: 学习音乐理论
live2d-widget：前端页面自定义看板娘
Screenshot-to-code：静态网页代码生成
latexify_py：Generates LaTeX math description from Python functions.
sherlock： Hunt down social media accounts by username across social networks
core：Open source home automation that puts local control and privacy first
wttr.in：⛅ The right way to check the weather in command line.
typora-plugin-bilibili：Typora粘贴图片自动上传到Bilibili图床，也可以自定义修改成任意其他图床接口。
DouZero_For_HappyDouDiZhu：基于DouZero定制AI实战欢乐斗地主
Unlock-netease-cloud-music：解锁网易云音乐客户端变灰歌曲
FileCentipede：下载工具
organicmaps：🍃 Organic Maps is a free Android & iOS offline maps app for travelers, tourists, hikers, and cyclists.

底层编译架构

并行计算

dask / dask: Parallel computing with task scheduling

测试工具

cypress-io / cypress：Fast, easy and reliable testing for anything that runs in a browser.
vscode-recipes：A collection of recipes for using VS Code with particular technologies.

6 数据集

工具包

处理中文

python-pinyin：汉字转拼音(pypinyin)
OpenCC: Conversion between Traditional and Simplified Chinese
zhon：Zhon is a Python library that provides constants commonly used in Chinese text processing.

NLP

中文自然语言处理数据集
100+ Chinese Word Vectors 上百种预训练中文词向量
glyph-aware-character-embedding：区别西文字母不同样式的vec
TX-WORD2VEC-SMALL：腾讯word2vec模型缩小版
Fasttext
laserembeddings：LASER multilingual sentence embeddings as a pip package
中文自然语言处理语料/数据集
nlp_chinese_corpus: 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
WEIBO_USER_DATA：收集了20W新浪微博用户的数据
中文NLP.数据集搜索：https://www.cluebenchmarks.com/dataSet_search.html
toutiao-multilevel-text-classfication-dataset：今日头条中文新闻文本(多层)分类数据集
chinese_chatbot_corpus：中文公开聊天语料库
chinese-xinhua：中华新华字典数据库。包括歇后语，成语，词语，汉字。
zhvoice：中文语音语料，语音更加清晰自然，包含8个开源数据集，3200个说话人，900小时语音，1300万字。
CognitiveInference：认知推理、常识知识库、常识推理与常识推理评估的系统项目
ChineseNlpCorpus-1：搜集、整理、发布中文自然语言处理语料/数据集
modern-poetry：最全的**近现代诗以及外国诗数据库
Poetry：非常全的古诗词数据，收录了从先秦到现代的共计85万余首古诗词。
ChineseLyrics：10W首中文歌词数据库
poetry：china ancient poetry project data
PersonGraphDataSet：人物图谱数据集，近十万的人物关系图谱事实数据库
chinese-poetry：最全中华古诗词数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人，21050首词。
tang_poetry：全唐诗数据库

标注工具

LabelMeAnnotationTool
Image Polygonal Annotation with Python
LabelImg is a graphical image annotation tool and label object bounding boxes in images
ChineseAnnotator：中文自然语言处理 (NLP) 标注工具
label-studio: a multi-type data labeling and annotation tool with standardized output format.
doccano：Open source annotation tool for machine learning practitioners

图书

singgel / Study-Floder
CS-Books：超过1000本的计算机经典书籍

7 Blogs + 面经

frankmcsherry / blog
Bert-for-Chinese-NLP
公众号文章小集
ML-NLP：机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现
深度学习500问
Reflection_Summary：算法理论基础知识
machinelearning: blogs for machine learning
Pre-trained-Models：NLP预训练模型的总结Blog
NLP-Interview-Notes
Tech Interview Guide 技术面试必备基础知识
ML-Interview
C-background-development-interview-experience
LogicStack-LeetCode：LeetCode 系列文章

附 git保持clone文件同步：

git checkout master   # ensure you are on the main "master" branch
# git stash             # reset any changes you have made, !!!NOTICE!!!
git pull              # pull the latest versions

RacleRay/LearingBetterGitRepos