/LearingBetterGitRepos

记录有用的Git repos

Primary LanguageBatchfile

Title: GitHub收集项目维护


1 编程语言

Python

基础

  1. one-python-craftsman:python

  2. PySnooper:Never use print for debugging again,debug tools

  3. The Flask Mega-Tutorial:Flask教程

  4. 微软教程:在 Windows 上使用 Python 进行开发

  5. realpython -- python-guide

  6. pipreqs:依赖包建立工具

  7. starlette: The little ASGI framework that shines which is ideal for building async web services in Python.

  8. fastapi

  9. python-small-examples

  10. pytudes:Python programs to practice or demonstrate skills.

  11. python-patterns:A collection of design patterns/idioms in Python

  12. python并行编程

  13. [C++联合编程 微软Doc]

  14. scalene:Scalene: a high-performance, high-precision CPU and memory profiler for Python

  15. pyinstrument: 🚴 Call stack profiler for Python. Shows you why your code is slow!

  16. devguide: The Python developer's guide

  17. python-cheatsheet

  18. python-systemd-tutorial: A tutorial for writing a systemd service in Python

开源项目

爬虫
  1. 收集各种爬虫 (默认爬虫语言为 python)

  2. lazynlp:Library to scrape and clean web pages to create massive datasets.

  3. WeiboSpider: This is a sina weibo spider built by scrapy

  4. weibospider:A distributed crawler for weibo, building with celery and requests.

  5. webspider:数据库用的是MySQL, 主要用到的库是celeryrequests,并实现了定时任务,出错重试,日志记录,自动更改Cookies等的功能

  6. scrapyscript:Run a Scrapy spider programmatically from a script or a Celery task

  7. pspider:一个简单的分布式爬虫框架

  8. taoyoulue_spider:基于mongodb存储,redis缓存,celery 实现的分布式爬虫。

  9. DeadPool:使用celery作为主体框架的爬虫应用,能够灵活的添加爬虫任务,并且同时运行多站点的爬虫工作

  10. python-bloomfilter:Scalable Bloom Filter implemented in Python

  11. learn_python3_spider:python爬虫教程系列、从0到1学习python爬虫,包括浏览器抓包,手机APP抓包

  12. pyspider:A Powerful Spider(Web Crawler) System in Python.

  13. examples-of-web-crawlers:一些非常有趣的python爬虫例子,对新手比较友好,主要爬取淘宝、天猫、微信、豆瓣、QQ等网站

  14. PythonCrawler:用python编写的爬虫项目集合

  15. MechanicalSoup: A Python library for automating interaction with websites

异步
  1. celery:Distributed Task Queue (development branch)【master-worker模式】【轻量级】
字符串搜索
  1. pyahocorasickpyahocorasick is a fast and memory efficient library for exact or approximate multi-pattern string search meaning that you can find multiple key strings occurrences at once in some input text.
  2. ahocorasick-python:AC自动机python的实现,并进行了优化
  3. ahocorapy: Pure python Aho-Corasick library.
  4. acora: Fast multi-keyword search engine for text strings.
  5. datrie:Fast, efficiently stored Trie for Python. Uses libdatrie. pypi
工具库
  1. rich:Rich is a Python library for rich text and beautiful formatting in the terminal.
  2. apscheduler:Task scheduling library for Python
  3. jupytext:Jupyter Notebooks as Markdown Documents, Julia, Python or R scripts
  4. nbdev:Create delightful python projects using Jupyter Notebooks
  5. python-wechaty-getting-started:Python Wechaty Starter Project Template that Works Out-of-the-Box。Wechaty is a RPA SDK for Wechat Individual Account that can help you create a chatbot in 9 lines of Python.
  6. python-wechaty
  7. sqlfluff: A SQL linter and auto-formatter
  8. pyenv:Python version management
  9. py-spy: Python 程序实时性能分析工具
  10. rocketry:Modern scheduling library for Python
  11. reloadium:Advanced Hot Reloading & Profiling for Python
  12. schedule:Python job scheduling for humans.
  13. pdf2docx: converting pdf to docx.
  14. Supervisor:是实际企业常用的一款 Linux/Unix 系统下的一个进程管理工具
  15. bandit:Bandit is a tool designed to find common security issues in Python code.
  16. Pillow: Python Imaging Library.
  17. memray:memory profiler for Python
Log
  1. loguru: 更简洁的python log工具
GUI
  1. DearPyGui: A fast and powerful Graphical User Interface Toolkit
  2. Tkinter-Designer: Create Beautiful Tkinter GUIs by Drag and Drop
数据库
  1. learndb-py:Learn database internals by implementing it from scratch.
  2. tinydb:a lightweight document oriented database.
  3. dataset:Easy-to-use data handling for SQL data.
  4. KeyDB:A Multithreaded Fork of Redis
  5. miniob:MiniOB is a compact database that assists developers in understanding the fundamental workings of a database.
  6. dragonfly:A modern replacement for Redis and Memcached
其他
  1. python-goose:Goose 用于文章提取器
  2. python-gems:有趣的 Pyhton 代码片段集合
  3. listen1:Listen 1 让你用一个网页就能听到多个网站的在线音乐,支持各种平台
  4. beijing_bus:北京实时公交,可以显示查询的公交到达某站还需多久
  5. tushare:TuShare 是一个免费、开源的 Python 财经数据接口包,TuShare 文档
  6. pyecharts:pyecharts是一个由 Echarts+Python 实现的一个用于生成 Echarts 图表的类库
  7. FeelUOwn:FeelUOwn 是一个用Python写的,面向Linux/macOS平台的开源音乐播放器
  8. superset:superset是一个实际企业级开发项目,由 airbnb用 Python开发的数据探索
  9. Spug:Spug 是一款使用 Python+Flask+Vue+Element 组件开发的开源运维管理系统
  10. textract:文字提取工具
  11. manim:Animation engine for explanatory math videos
  12. gopup:数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据...
  13. bigdata_analyse:简短的数据分析demo
  14. playwright-python: python操作浏览器
  15. PyWebIO:Write interactive web app in script way. 用python生成前端代码
  16. moviepy:使用python简易编辑视频
  17. playwright-python:Python version of the Playwright testing and automation library.
  18. Handright:模拟手写汉字
  19. WantWords:根据输入表达,输出同义的词
  20. docker-py:使用python创建docker
  21. pikepdf:编辑PDF
  22. kopf: 用 Python 轻松完成,需要条件判断、事件触发等复杂的 k8s 操作。
  23. matrix-webcam:Take your video conference from within the matrix.
  24. latexify_py:A library to generate LaTeX expression from Python code.
  25. Games:Create interesting games by pure python.
  26. FileCodeBox: 匿名口令分享文本,文件,像拿快递一样取文件
  27. Django-Styleguide: Django styleguide used in HackSoft projects
  28. bar_chart_race: Create animated bar chart races in Python with matplotlib

C++

文档资料

  1. CPlusPlusThings:C++那些事
  2. CppCoreGuidelines:The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
  3. EffectiveModernCppChinese:《Effective Modern C++》翻译
  4. Cpp_Primer_Practice:C++ Primer 笔记和课后练习答案
  5. TCP-IP-NetworkNote:《TCP/IP网络编程》(韩-尹圣雨)学习笔记
  6. rCore-Tutorial-Book-v3:清华操作系统课rCore
  7. tensorflow-internals:TensorFlow kernel and implementation mechanism.
  8. cpp-game-engine-book: 从零编写游戏引擎教程
  9. patterns-of-distributed-systems:《Patterns of Distributed Systems》中文版
  10. PPHC:《高并发的哲学原理》开源图书(CC BY-NC-ND)
  11. modern-cpp-features:A cheatsheet of modern C++ language and library features.
  12. awesome-c-cn:C 资源大全中文版,包括构建系统、编译器、数据库、加密、初中高的教程/指南、书籍、库

开发工具

  1. openFrameworks:openFrameworks is a community-developed cross platform toolkit for creative coding in C++.
  2. plywood : A multimedia development kit for C++
  3. folly: An open-source C++ library developed and used at Facebook.
  4. mlibc: 可移植的C标准库
  5. util-linux
  6. MyTinySTL: stl
  7. WindowsAppSDK :Windows App SDK empowers all Windows Desktop apps with modern Windows UI, APIs, and platform features, including back-compat support, shipped via NuGet.
  8. range-v3:Range library for C++14/17/20, basis for C++20's std::ranges
  9. vcpkg:微软开源的 C/C++ 包管理工具
  10. libqalculate:使用 C++ 编写的多功能计算器桌面应用、库和 CLI 程序
  11. indicators: C++ 编写的进度条库
  12. Crow:A Fast and Easy to use microframework for the web.、
  13. poco:cross-platform C++ libraries for building network- and internet-based applications that run on desktop, server, mobile, IoT, and embedded systems. (Websocket)

Debug

  1. rr: Record and Replay Debug Framework
  2. backward-cpp:A beautiful stack trace pretty printer for C++. 优化编译器报错信息
  3. Sourcetrail:Sourcetrail - free and open-source interactive source explorer,代码结构可视化
  4. rr:Record and Replay Framework

日志

  1. spdlog: C++ 日志库

服务器

  1. oatpp: Light and powerful C++ web framework for highly scalable and resource-efficient web application.
  2. nginx: 静态服务器
  3. fasthttp (Go):Fast HTTP package for Go.
  4. Redis
  5. f-stack: F-Stack is an user space network development kit with high performance based on DPDK, FreeBSD TCP/IP stack and coroutine API.
  6. wrk: Modern HTTP benchmarking tool
  7. sylar-from-scratch: 从零开始重写sylar C++高性能分布式服务器框架
  8. sylar: C++高性能分布式服务器框架,webserver,websocket server
C++
  1. TinyWebServer:Linux下C++轻量级Web服务器
  2. TinyWeb: web server in C++11.
  3. 30dayMakeCppServer:自制C++服务器
  4. drogon:Drogon: A C++14/17 based HTTP web application framework running on Linux/macOS/Unix/Windows
  5. cpp-httplib: A C++ header-only HTTP/HTTPS server and client library
  6. yalantinglibs: A collection of C++20 libraries, include async_simple, coro_rpc and struct_pack
  7. dragonfly: A modern replacement for Redis and Memcached
  8. brpc:百度开源的 RPC 框架
  9. srpc:基于 C++ Workflow 的高性能 RPC 框架
  10. Server: build a Async Server.
  11. WebServer:A C++ High Performance Web Server
  12. WebServer:Epoll与线程池实现多线程的Reactor Server
  13. TKeed:High Performance HTTP WebServer
  14. cpp-httplib:A C++ header-only HTTP/HTTPS server and client library
  15. ananas:A C++11 RPC framework based on future and protobuf, with utility: timer,ssl,future/promise,log,coroutine,etc
  16. da4qi4:web server
  17. tinyrpc: c++ async rpc framework. 14w+qps.
C
  1. mongoose: Embedded Web Server
  2. Tinyhttpd: http server
  3. C-Web-Server: A simple webserver written in C
  4. socket-server: socket server examples.
  5. freeradius-server: A multi-protocol policy server.
  6. civetweb: Embedded C/C++ web server
  7. pure-ftpd: Pure FTP server
  8. tiny-web-server: tiny web server in C
  9. wsServer: a tiny WebSocket server library written in C
  10. chat_room:多人聊天工具
media server
  1. nginx-rtmp-module: NGINX-based Media Streaming Server
  2. media-server: RTSP/RTP/RTMP/FLV/HLS/MPEG-TS/MPEG-PS/MPEG-DASH/MP4/fMP4/MKV/WebM
  3. ZLMediaKit:一个基于C++11的高性能运营级流媒体服务框架
websocket
  1. libwebsockets: canonical libwebsockets.org networking library
  2. uWebSockets: Simple, secure & standards compliant web server for the most demanding of applications
  3. websocketpp: C++ websocket client/server library
Game Server Framework
  1. atsf4g-co: service framework for game server
  2. YTSvrLib:一个简单但功能强大的跨平台(Win/Linux)游戏服务器框架
  3. server:game server
  4. skynet:A lightweight online game framework
CGI
  1. fastcgi-async-or-coroutine:mcgi is a asynchronous fastcgi using Muduo Network Library. cocgi is a coroutine fastcgi using Tencent Libco Library.

并发

  1. async_simple: 轻量级C++异步框架
  2. workflow: C++ Parallel Computing and Asynchronous Networking Engine
  3. libgo:Go-style concurrency in C++11
  4. coost:A tiny boost library in C++11.
  5. libaco:A blazing fast and lightweight C asymmetric coroutine library
  6. quantum: Powerful multi-threaded coroutine dispatcher and parallel execution engine

for basic learn:

  1. luce:C++20协程net,基于epoll,可以方便地使用await语法
  2. co_context:C++协程框架
  3. workspace:基于C++11的轻量级异步执行框架,支持:通用任务异步并发执行、优先级任务调度、自适应动态线程池、高效静态线程池、异常处理机制等。
  4. handy: 简洁易用的C++11网络库 / 支持单机千万并发连接 / a simple C++11 network server framework
  5. coroutine: A asymmetric coroutine library for C.
  6. TSF4G :服务器开发工具,blog doc
  7. coroutine:C++ 20 Coroutines in Action.

QT

  1. FileCentipede:Cross-platform internet upload/download manager for HTTP(S), FTP(S), SSH, magnet-link, BitTorrent, m3u8, ed2k, and online videos.

量化

  1. wondertrader:量化研发交易一站式框架 ,文档

编码

  1. utf8.h:single header utf8 string functions for C and C++
  2. utf8proc:a clean C library for processing UTF-8 Unicode data
  3. utf8:A simple utf8 decode/encode lib in c
  4. gbk-utf8:GBK to UTF-8,vice versa.
  5. conversion_gbk_utf8: C++11 gbk和utf8编码的相互转换收集
  6. qqee_clib:跨平台C语言基础库,适配任意ANSI_C编译器

测试

  1. Catch2:A modern, C++-native, header-only, test framework for unit-tests
  2. doctest:The fastest feature-rich C++11/14/17/20/23 single-header testing framework

数据库

  1. rocksdb:基于 levelDB 开发,使用 C++ 编写的高性能键值存储引擎
  2. KeyDB:A Multithreaded Fork of Redis
  3. redis-plus-plus: Redis client written in C++
  4. ormpp: modern C++ ORM, C++17, support mysql, postgresql,sqlite
  5. bustub: 数据库管理系统 ,15445.courses.cs.cmu.edu/

小游戏

  1. sudoku:C++实现的跨平台数独游戏
  2. Snake:贪吃蛇游戏 AI 版
  3. FSHistory: Play and Enjoy the History of Microsoft Flight Simulator

序列化

  1. cjson
  2. json:C++ 的 JSON 库
  3. fast-cpp-csv-parser

算法

  1. simhash:中文文档计算出对应的 simhash 值。simhash 是谷歌用来进行文本去重的算法(详见 simhash 算法原理及实现
  2. ltp:语言技术平台(Language Technology Platform,LTP)是哈工大社会计算与信息检索研究中心历时十年开发的一整套中文语言处理系统
  3. TinyML:精简的C++ 机器学习库
  4. flashlight: A C++ standalone library for machine learning
  5. uthash:C macros for hash tables and more
  6. wwsearch:腾讯全文搜索引擎
  7. KuiperInfer:一个高性能的深度学习推理库

语言

  1. taichi:太极是一种用于计算机图形应用的高性能编程语言
  2. craftinginterpreters:Crafting Interpreters

引擎

  1. godot:Godot Engine – Multi-platform 2D and 3D game engine

实用工具

  1. aseprite:Animated sprite editor & pixel art tool
  2. krita: 开源绘画软件
  3. vnote:Markdown 编辑软件
  4. keepassx: a cross platform port of the windows application “Keepass Password Safe”.

Julia

  1. julia_notebooks:Julia Jupyter/Colab Notebooks
  2. julia:The Julia Language: A fresh approach to technical computing.

Java

  1. SpringAll:循序渐进,学习Spring Boot、Spring Boot & Shiro、Spring Batch、Spring Cloud、Spring Cloud Alibaba、Spring Security & Spring Security OAuth2

GO

  1. 7days-golang:7 days golang apps from scratch、
  2. GoSpark
  3. golearn:Machine Learning for Go

RUST

This Week in Rust (this-week-in-rust.org)

操作系统

  1. tock:A secure embedded operating system for microcontrollers
  2. blog_os:Writing an OS in Rust

系统工具开发

  1. ripgrep:recursively searches directories for a regex pattern while respecting your gitignore

异步编程

  1. mio:Metal I/O library for Rust.

数学计算

  1. nalgebra:Linear algebra library for Rust.

应用

  1. ncspot:Cross-platform ncurses Spotify client written in Rust, inspired by ncmpc and the likes.
  2. mdBook:Create book from markdown files.
  3. zino:Next-generation framework for composable applications in Rust.

游戏引擎

  1. bevy:A refreshingly simple data-driven game engine built in Rust.
  2. hematite:A simple Minecraft written in Rust with the Piston game engine
  3. Fyrox:3D and 2D game engine written in Rust

书籍/博客

  1. command-line-rust: A Project-Based Primer for Writing Rust CLIs
  2. rust-blog:Educational blog posts for Rust beginners

数据结构与算法

  1. 数据结构和算法必知必会的50个代码实现
  2. Python算法实现-轻量级 用动画的形式呈现解LeetCode题目的思路-轻量级
  3. All Algorithms implemented in Python
  4. algorithms:Minimal examples of data structures and algorithms in Python
  5. Algorithms implemented in C++:For education
  6. awesome-algorithms:精选的学习和/或练习算法的资源列表
  7. fucking-algorithm:刷算法全靠套路,labuladong
  8. leetcode_company_wise_questions :按公司分类的题目列表
  9. ands:Algorithms and data structures for educational, demonstrational and experimental purposes.
  10. Algorithms_in_C
  11. hello-algo:动画图解、能运行、可提问的数据结构与算法入门书。提供 Java, C++, Python, Go, JS, TS, C# 多语言代码实现。

计算机科学基础

  1. 计算机速成课
  2. TeachYourselfCS-CN
  3. computer-science:🎓 Path to a free self-taught education in Computer Science!
  4. University-Courses-China: **几所大学课程资料整理,里面有的课程资料带有习题答案,这点很nice
  5. REKCARC-TSC-UHT:清华大学计算机系课程攻略
  6. zju-icicles:浙江大学课程攻略共享计划 web page
  7. CS-Xmind-Note:计算机专业课(408)思维导图和笔记:计算机组成原理(第五版 王爱英),数据结构(王道),计算机网络(第七版 谢希仁),操作系统(第四版 汤小丹)
  8. awesome-courses:List of awesome university courses for learning Computer Science!
  9. Computer-Networking-A-Top-Down-Approach-NOTES:《计算机网络-自顶向下方法(原书第6版)》编程作业,Wireshark实验文档的翻译和解答
  10. dragon-book-exercise-answers:编译原理(紫龙书)第2版习题答案
  11. 90DaysOfDevOps: documenting repository for learning the world of DevOps.
  12. Developer-Books:编程开发相关书籍整理分享

CSAPP

深入理解计算机系统:videos&lectures

  1. labs: https://github.com/Exely/CSAPP-Labs

系统设计

架构技能树

architect-awesome

GameDevMind


自己动手写项目

  1. danistefanovic / build-your-own-x:Build your own (insert technology here)
  2. bregman-arie / devops-exercises:Linux, Jenkins, AWS, SRE, Prometheus, Docker, Python, Ansible, Git, Kubernetes, Terraform, OpenStack, SQL, NoSQL, Azure, GCP, DNS, Elastic, Network, Virtualization
  3. PlayWithCompiler:A GeekTime course about constructing a compiler
  4. 15-minute-apps:15 minute (small) desktop apps built with PyQt
  5. wyoos:Source codes for the "Write your own Operating System" video-series on YouTube
  6. SerenityOS:复古OS,较大的项目

IoT

  1. IoT-For-Beginners: 24 Lessons
  2. TencentOS-tiny: 物联网终端操作系统
  3. micropython: 可运行在单片机上的 Python
  4. rpi4-osdev:Tutorial: Writing a "bare metal" operating system for Raspberry Pi 4
  5. Awesome IoT
  6. Awesome-Embedded

其他语言

  1. purr-data:Pure Data (aka Pd) is a visual programming language.
  2. astro: Build fast websites, faster. 🚀🧑‍🚀✨. 快速编写简单web页面
  3. 开源软件指南 | Open Source Guides:Github开源贡献指南

2 深度学习

Colab神器

  1. colabcode: Run VSCode (codeserver) on Google Colab or Kaggle Notebooks. 访问时,使用 https 协议 ! python extention 使用 2020.5.86806 版本,最新版本不能debug 和 选择 解释器。

损失函数

  1. self-adj-dice:Implementation of Self-adjusting Dice Loss from "Dice Loss for Data-imbalanced NLP Tasks" paper
  2. pytorch-loss:label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax ...

AI 系统

  1. DeepLearningSystem:Deep Learning System core principles introduction.

资源

NLP

1 Transformer
  1. delight:DeLighT: Very Deep and Light-Weight Transformers. DeFINE (ICLR'20) and DeLighT (preprint).
  2. Transformer-Clinic:Understanding the Difficulty of Training Transformers
  3. rezero:This repository contains the ReZero-Transformer implementation from the paper. It matches Pytorch's Transformer and can be easily used as a drop-in replacement. ReZero is All You Need: Fast Convergence at Large Depth; ArXiv, March 2020.
  4. collaborative-attention:Code for Multi-Head Attention: Collaborate Instead of Concatenate
  5. sentence_transformer_zh
  6. Transformer-Transducer:A streamable speech recognition model。
  7. Informer2020:the origin Pytorch implementation of Informer in the following paper: Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting.
  8. linformer-pytorch: implementation of Linformer for Pytorch.
  9. awesome-fast-attention: list of efficient attention modules
  10. Synthesizer-Rethinking-Self-Attention-Transformer-Models + Synthesizer: Implementing SYNTHESIZER
  11. aft-pytorch:Unofficial PyTorch implementation of the Attention Free Transformer's AFT-Full layer by Apple Inc
  12. x-transformers:A simple but complete full-attention transformer with a set of promising experimental features from various papers
  13. Transformers-Tutorials:demos made with the Transformers library by HuggingFace.
2 预训练模型
  1. awesome-pretrained-chinese-nlp-models:高质量中文预训练模型下载链接集合

  2. awesome-bert: BERT、XLNet 相关论文和 github 项目

  3. BERT-Tickets:BERT的进一步探究

  4. TensorFlow 官方 code and pre-trained models for BERT

  5. UER-py:Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo

  6. huggingface transformers:Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

  7. ERINE:An Implementation of ERNIE For Language Understanding (including Pre-training models and Fine-tuning tools)

  8. ERNIE-Pytorch

  9. 五行代码玩转GPT-2Easy-to-use Wrapper for GPT-2

  10. google-research / text-to-text-transfer-transformer

  11. bert-as-service:Mapping a variable-length sentence to a fixed-length vector using BERT model

  12. Chinese-BERT-wwm: Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)

  13. Chinese-ELECTRA:Pre-trained Chinese ELECTRA(中文ELECTRA预训练模型)

  14. bert-for-tf2:A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.

  15. bert4keras:keras implement of transformers for humans

  16. ERNIE-Pytorch: This project is to convert ERNIE to huggingface's format.

  17. unilm:UniLM - Unified Language Model Pre-training / Pre-trained models for natural language understanding (NLU) and generation (NLG) tasks

  18. XLnet

  1. marge-pytorch:Implementation of Marge, Pre-training via Paraphrasing, in Pytorch
  2. BERT-CCPoem:BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry
  3. microsoft/Unicoder : Unicoder model for understanding and generation. This repo provides the code for reproducing the experiments in XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation (leaderboard).
  4. CPM-Generate:Chinese Pre-Trained Language Models (CPM-LM) Version-I
  5. CPM-LM-TF2:TensorFlow 2.x CPM-Generate
  6. BERT-whitening:简单的向量白化就可以媲美BERT flow
  7. OptiPrompt:NAACL'2021: Factual Probing Is [MASK]: Learning vs. Learning to Recall
  8. R-Drop:R-drop: Regularized Dropout for Neural Networks. 一种loss设计
  9. 悟道中文语言模型资源
3 其他语言模型
  1. T5: https://github.com/google-research/text-to-text-transfer-transformer 中文博客

  2. google-research https://github.com/google-research/google-research

  3. ELECTRA: 超越BERT, 19年最佳NLP预训练模型

  4. ZEN中文预训练语言模型: https://github.com/sinovation/ZEN

  5. albert:ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

  6. GPT2-Chinese

  7. gpt2-ml:GPT2 for Multiple Languages, including pretrained models. GPT2 多语言支持, 15亿参数中文预训练模型

  8. Decoders-Chinese-TF2.0:GPT2 training script for Chinese in Tensorflow 2.0

  9. MetaAdapter:Specific Layers in Multilingual Language Models

  10. BERT-flow:TensorFlow implementation of On the Sentence Embeddings from Pre-trained Language Models (EMNLP 2020)

  11. gpt-2:Code for the paper "Language Models are Unsupervised Multitask Learners"

  12. NeZha_Chinese_PyTorch:pytorch版NEZHA,适配transformers

  13. luke:LUKE -- Language Understanding with Knowledge-based Embeddings

4 CRF、LAN
  1. label-attention inference paper
  2. pytorch-struct:A library of tested, GPU implementations of core structured prediction algorithms for deep learning applications. [概率图模型]
5 CLUE项目合辑(NLP)

CLUE:Organization of Language Understanding Evaluation benchmark for Chinese

  1. CLUEPretrainedModels:高质量中文预训练模型集合:最先进大模型、最快小模型、相似度专门模型
  2. CLUECorpus2020:Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料
  3. CLUEDatasetSearch:搜索所有中文NLP数据集
  4. CLUE:中文任务基准测评结果
  5. CLGE:中文生成任务基准测评结果
  6. ELECTRA:中文 预训练 ELECTRA 模型: 基于对抗学习
  7. CLUENER2020:中文细粒度命名实体识别
  8. CLUEmotionAnalysis2020:细粒度情感分析数据集
  9. DistilBert: 海量中文预训练蒸馏bert模型
  10. MobileQA: 离线端阅读理解应用 QA for mobile, Android
6 BERT Applications
  1. rasa_chatbot_cn:基于最新版本rasa搭建的对话系统
  2. Bert-Chinese-Text-Classification-Pytorch:使用Bert,ERNIE,进行中文文本分类
  3. BERT-train2deploy:BERT模型从训练到部署
  4. rasa-tutorial:Rasa中文demo与指南
  5. rasa-ui: Rasa UI is a frontend for the Rasa Framework
  6. text_matching:常用文本匹配模型tf版本,数据集为QA_corpus
  7. sentence-transformers:Sentence Embeddings with BERT & XLNet, https://arxiv.org/abs/1908.10084
  8. labse:Language-agnostic BERT Sentence Embedding (LaBSE)
  9. BERTopic:BERTopic is a topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics whilst keeping important words in the topic descriptions.
  10. Top2Vec:Top2Vec is an algorithm for topic modeling and semantic search. It automatically detects topics present in text and generates jointly embedded topic, document and word vectors.
  11. EssayKiller_V2:基于开源GPT2.0的初代创作型人工智能 | 可扩展、可进化
  12. Guyu:pre-training and fine-tuning framework for text generation
  13. LM-BFF:ACL'2021: LM-BFF: Better Few-shot Fine-tuning of Language Models
  14. AliceMind:pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD
7 NER
  1. spert:PyTorch code for SpERT: "Span-based Entity and Relation Transformer". For a description of the model and experiments, see our paper: https://arxiv.org/abs/1909.07755 (accepted at ECAI 2020).
  2. mrc-for-flat-nested-ner:The code for "A Unified MRC Framework for Named Entity Recognition"
  3. AutoNER: Learning Named Entity Tagger from Domain-Specific Dictionary. 远程监督方法训练,利用无标注数据。
  • Inference:
    • LightNER: inference w. models pre-trained / trained w. any following tools, efficiently.
  • Training:
    • LD-Net: train NER models w. efficient contextualized representations.
    • VanillaNER: train vanilla NER models w. pre-trained embedding.
  • Distant Training:
    • AutoNER: train NER models w.o. line-by-line annotations and get competitive performance.
8 ELMo And Others
  1. bilm-tf:Tensorflow implementation,allen ai
  2. 预训练模型
  3. ELMoForManyLangs: 中文模型
  4. sentence embeddings: InferSent
9 相似度匹配
  1. cail2019:法研杯2019相似案例匹配第二名解决方案(附数据集和文档)
  2. StarSpace:Learning embeddings for classification, retrieval and ranking.
  3. DSSM
  4. Chinese-sentence-similarity-task:中文问题句子相似度计算比赛及方案汇总
  5. Question-Answering-Albert-Electra : Question Answering using Albert and Electra
  6. simbert:a bert for retrieval and generation
  7. haystack: Transformers at scale for question answering & search
  8. epidemic-sentence-pair:天池 疫情相似句对判定大赛 线上第一名方案
  9. deep_text_matching: implementation several deep text match (text similarly) models for keras . cdssm, arc-ii,match_pyramid, mvlstm ,esim, drcn ,bimpm, bert, albert, roberta
  10. attention-feature-distillation:Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)
10 文本分类
  1. text_classification:all kinds of text classification models and more with deep learning
  2. Chinese-Text-Classification-Pytorch:中文文本分类,TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention,DPCNN,Transformer,基于pytorch,开箱即用
  3. TextClassificationBenchmark:A Benchmark of Text Classification in PyTorch
  4. Bert-Chinese-Text-Classification-Pytorch:使用Bert,ERNIE,进行中文文本分类
  5. multi-class-text-classification-cnn:Classify Kaggle Consumer Finance Complaints into 11 classes. Build the model with CNN (Convolutional Neural Network) and Word Embeddings on Tensorflow.
  6. deep-text-classifier-mtl:tensorflow script for multi-task learning implementation of Kim's paper : Convolutional Neural Networks for Sentence Classification.
  7. multi_task-nlp-bert: NLP multi-task learning, which includes single-sentence classification, pairwise text similarity, pairwise text classification, and relevance ranking.
  8. TextFooler:A Model for Natural Language Attack on Text Classification and Inference【对抗攻击】
  9. lightning-text-classification:Minimalist implementation of a BERT Sentence Classifier with PyTorch Lightning, Transformers and PyTorch-NLP.
  10. Sequence Projection Models >> [PRADO]:A family of models that projects sequence to fixed sized features. The idea behind is to build embedding-free models that minimize the model size. Instead of using embedding table to lookup embeddings, sequence projection models computes them on the fly.
  11. pytorch-sentiment-analysis:Tutorials on getting started with PyTorch and TorchText for sentiment analysis.
  12. BertGCN
  13. detext:DeText: A Deep Neural Text Understanding Framework for Ranking and Classification Tasks
  14. LTP:Learned Token Pruning for Transformers
  15. regularized-embeddings:code for the “Text classification with word embedding regularization and soft similarity measure” (Novotný et al., 2020) paper
11 Aspect Based Sentiment Analysis
  1. ABSA-PyTorch
  2. BERT-for-RRC-ABSA:code for our NAACL 2019 paper: "BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis"
  3. Aspect-Based-Sentiment-Analysis:Aspect-Based-Sentiment-Analysis: Transformer & Explainable ML (TensorFlow)
  4. torchMoji:A pyTorch implementation of the DeepMoji model: state-of-the-art deep learning model for analyzing sentiment, emotion
  5. PyABSA:Open Framework for Aspect-based Sentiment Analysis based on state-of-the-art Models [NEW]
12 文本摘要
  1. textsum:Sequence-to-Sequence with Attention Model for Text Summarization.
  2. sumy:Module for automatic summarization of text documents and HTML pages.
  3. pointer_summarizer:pytorch implementation of "Get To The Point: Summarization with Pointer-Generator Networks"
  4. pointer-generator:Code for the ACL 2017 paper "Get To The Point: Summarization with Pointer-Generator Networks"
  5. transformer-pointer-generator: Transformer and Pointer-generator
  6. BertSum:Code for paper Fine-tune BERT for Extractive Summarization
  7. hiersumm:Code for paper Hierarchical Transformers for Multi-Document Summarization in ACL2019
  8. rouge
  9. pegasus:Pre-training with Extracted Gap-sentences for Abstractive SUmmarization Sequence-to-sequence models, or PEGASUS, uses self-supervised objective Gap Sentences Generation (GSG) to train a transformer encoder-decoder model.
  10. awesome-text-summarization:The guide to tackle with the Text Summarization
  11. SPACES:端到端的长本文摘要模型(法研杯2020司法摘要赛道)
  12. GPT2-NewsTitle:中文GPT2新闻标题生成项目。
  13. Texygen:A text generation benchmarking platform
  14. summarize-from-feedback:基于强化学习的SOTA
13 seq2seq
  1. nlg-eval: 评测指标
  2. OpenSeq2Seq:Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
  3. Magenta:involves developing new deep learning and reinforcement learning algorithms for generating songs, images, drawings, and other materials.
  4. deepnmt:This PyTorch package implements Very Deep Transformers for Neural Machine Translation
  5. EmbeddinglessNMT:The implementation of "Neural Machine Translation without Embeddings"
  6. seq2seq-couplet:Play couplet with seq2seq model.
  7. TransCoder:Pytorch original implementation of TransCoder in Unsupervised Translation of Programming Languages
  8. Sentence-VAE:PyTorch Re-Implementation of "Generating Sentences from a Continuous Space" by Bowman et al 2015 https://arxiv.org/abs/1511.06349
  9. Deep Generative Models for Natural Language Processing Papers
14 QA
  1. qa_match:A simple effective ToolKit for short text matching
  2. QA-Survey:对问答系统的调研。
  3. QueryGeneration:Conversational Standard Meta Language
  4. acl2020-openqa-tutorial:ACL2020 Tutorial: Open-Domain Question Answering
  5. ccf_2020_qa_match: ccf 2020 qa match competition top1
  6. AnyQ(ANswer Your Questions): 开源项目主要包含面向FAQ集合的问答系统框架、文本语义匹配工具SimNet
15 ModelZoo
  1. Gathers machine learning and Tensorflow deep learning models for NLP problems
  2. MatchZoo 是一个通用的文本匹配工具包: deep text matching models
  3. awesome-sentence-embedding
  4. Awesome-Chinese-NLP:中文自然语言处理相关资料
  5. awesome-nlp:研究进展、guides、工具包等
  6. 中文自然语言处理 Chinese NLP:各种任务sota baseline
  7. funNLP:相关资源合集
16 开源包
  1. nematus:Open-Source Neural Machine Translation in Tensorflow
  2. Gluon-NLP
  3. SnowNLP:Python library for processing Chinese text
  4. gensim :Topic Modelling in Python
  5. PyText:A natural language modeling framework based on PyTorch,is a deep-learning based NLP modeling framework built on PyTorch.
  6. allennlp:An open-source NLP research library, built on PyTorch
  7. 结巴中文分词
  8. lda2vec: Tools for interpreting natural language
  9. 复旦 fastNLP:: A Modularized and Extensible NLP Framework. Currently still in incubation.
  10. fast text: representation and classification.
  11. autotuning for fastText
  12. HanLP
  13. ltp:ltp 4.0 版,比3.x版本安装使用方便多了
  14. pkuseg多领域中文分词工具:pkuseg简单易用,支持细分领域分词,有效提升了分词准确度
  15. An open-source neural machine translation toolkit 清华
  16. Neural Modules: a toolkit for conversational AI
  17. stanza : Official Stanford NLP Python Library for Many Human Languages Doc
  18. StarSpace:Learning embeddings for classification, retrieval and ranking.
  19. BigARTM:topic model
  20. tkitMarker_bert:使用bert微调提取实体,描述
  21. fastHan:fastHan是基于fastNLP与pytorch实现的中文自然语言处理工具,像spacy一样调用方便。
  22. Jiagu:Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
  23. KILT:A Benchmark for Knowledge Intensive Language Tasks。
  24. AutoPhrase:Automated Phrase Mining from Massive Text Corpora
  25. Kashgari-doc-zh Kashgari:Kashgari 是一个极简且强大的 NLP 框架,可用于文本分类和标注的学习,研究及部署上线
  26. Senta:Baidu's open-source Sentiment Analysis System.
  27. DDParser:百度开源的依存句法分析系统
  28. FastBERT
  29. OpenMatch:An Open-Source Package for Information Retrieval.
  30. robustness-gym:Evaluation Toolkit for NLP
  31. elasticsearch-py:Official Python low-level client for Elasticsearch
  32. py-googletrans:Free and Unlimited Google translate API for Python.
  33. UDA_pytorch:UDA(Unsupervised Data Augmentation) implemented by pytorch
  34. PaddleNLP:An NLP library with Awesome pre-trained Transformer models
  35. mars:a tensor-based unified framework for large-scale data computation which scales Numpy, pandas, Scikit-learn and Python functions.
  36. knlp:类似 snownlp 和 textblob,调用方便,提供基础算法的训练和推理的脚本,各种nlp任务的评估方法以及评估数据集,提供深度学习,面向中文开发,且功能很基础,适合于二次改造。
  37. skweak: A software toolkit for weak supervision applied to NLP tasks
  38. pytorch-metric-learning: The easiest way to use deep metric learning in your application. 可直接使用的 NTXENT loss (InfoNCE) ,SupContrast loss等对比学习损失。
  39. dice_loss_for_NLP:ACL2020 paper Dice Loss for Data-imbalanced NLP Tasks
  40. TextBlob:Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
17 其他model
  1. show-attend-and-tell
  2. Retrieval-Based Conversational Model in Tensorflow
  3. sru:Training RNNs as Fast as CNNs
  4. Self_Explaining_Structures_Improve_NLP_Models: BERT输出特征交叉的一种尝试,自定义模型层维度不要太大。依赖调参,效果说好一点就好一点吧。
18 关系抽取
  1. Snowball:Snowball: Extracting Relations from Large Plain-Text Collections
  2. OpenNRE: relation extraction models.
  3. MRC4ERE_plus:Implementation for Paper "Asking Effective and Diverse Questions: A Machine Reading Comprehension based Framework for Joint Entity-Relation Extraction"
  4. AlpacaTag:AlpacaTag: An Active Learning-based Crowd Annotation Framework for Sequence Tagging (ACL 2019 Demo)
  5. Distant-Supervised-Chinese-Relation-Extraction:基于远监督的中文关系抽取
  6. Entity-Relation-Extraction:Entity and Relation Extraction Based on TensorFlow and BERT.
  7. Information-Extraction-Chinese:Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT
  8. pytorch-relation-extraction:distant supervised relation extraction models: PCNN MIL (Zeng 2015), PCNN+ATT(Lin 2016).
  9. USC-DS-RelationExtraction:Distantly Supervised Relation Extraction
  10. open-entity-relation-extraction:Knowledge triples extraction and knowledge base construction based on dependency syntax for open domain text.
  11. BERT-Relation-Extraction:PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper
  12. PersonRelationKnowledgeGraph:bootstrapping方法的人物关系抽取,基于知识图谱的知识问答等应用
  13. OpenKE:An Open-Source Package for Knowledge Embedding (KE)
  14. deepke:基于 Pytorch 的深度学习中文关系抽取处理套件
  15. DeepIE:DeepIE: Deep Learning for Information Extraction
  16. CasRel:A Novel Cascade Binary Tagging Framework for Relational Triple Extraction
  17. CasRel-pytorch-reimplement
  18. two-are-better-than-one:Code associated with the paper “Two are Better Than One: Joint Entity and Relation Extraction with Table-Sequence Encoders”, at EMNLP 2020
  19. TPlinker-joint-extraction:TPLinker: Single-stage Joint Extraction of Entities and Relations Through Token Pair Linking
  20. PURE:NAACL'2021: A Frustratingly Easy Approach for Entity and Relation Extraction
  21. pke:Python Keyphrase Extraction module
19 蒸馏
  1. TextBrewer:A PyTorch-based knowledge distillation toolkit for natural language processing
  2. KD_Lib:A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.
  3. RepDistiller:[ICLR 2020] Contrastive Representation Distillation (CRD), and benchmark of recent knowledge distillation methods
  4. KD_SRRL:Paper. Knowledge distillation via softmax regression representation learning
20 对话
  1. DeepPavlov:An open source library for deep learning end-to-end dialog systems and chatbots.
  2. ConvLab-2:ConvLab-2: An Open-Source Toolkit for Building, Evaluating, and Diagnosing Dialogue Systems
  3. rasa-chatbot:Sample chatbot with rasa stack
  4. nezha_gpt_dialog
  5. DialoGPT:Large-scale pretraining for dialogue
  6. CDial-GPT:A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models
  7. mirai:高效率 QQ 机器人框架 / High-performance bot framework for Tencent QQ
  8. unit-dmkit: DMKit作为UNIT的开源对话管理模块,可以无缝对接UNIT的理解能力,并赋予开发者多状态的复杂对话流程管理能力,还可以低成本对接外部知识库,迅速丰富话术信息量
  9. chat:基于自然语言理解与机器学习的聊天机器人,支持多用户并发及自定义多轮对话。对知识图谱以及 KBQA 感兴趣,想从0开始构建自己的知识图谱
  10. SMP2018:SMP2018中文人机对话技术评测(ECDT)
  11. GPT2-chitchat:GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI**)
  12. 百度对话系统
  13. Rasa:开源机器学习框架,用于自动化基于文本和语音的对话
  14. ParlAI:sharing, training and evaluating dialogue models across many tasks.
  15. Task-Oriented-Dialogue-Research-Progress-Survey
  16. MetaDialog:Platform for few-shot natural language processing: Text Classification, Sequene Labeling.
  17. ChatGLM-6B:开源双语对话语言模型
21 意图与槽位填充
  1. FewShotMultiLabel:AAAI2021 paper: Few-Shot Learning for Multi-label Intent Detection.
  2. FewShotTagging:ACL2020 paper: Few-shot Slot Tagging with Collapsed Dependency Transfer and Label-enhanced Task-adaptive Projection Network
22 指代消解
  1. hobbs:Implementation of Hobbs' algorithm for coreference resolution in python
23 主题
  1. microsoft / LightLDA
24 自动机
  1. automata:A Python library for simulating finite automata, pushdown automata, and Turing machines
25 阅读理解
  1. SDNet
  2. SogouMRCToolkit: fast and efficient development of modern machine comprehension models, including both published models and original prototypes
26 数据增强
  1. Cutoff:Cutoff data augmentation approach for NLP
27 Prompt
  1. PromptPapers:Must-read papers on prompt-based tuning for pre-trained language models.
  2. autoprompt:AutoPrompt: Automatic Prompt Construction for Masked Language Models.
  3. P-tuning:CPM(Chinese PL)模型的 fine-tune 代码仓库,可以用于模型 fine-tune 的多机多卡训练/测试。
  4. P-tuning-v2:ACL 2022 paper "P-tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Across Scales and Tasks"
  5. OpenPrompt:An Open-Source Framework for Prompt-Learning
  6. iPrompt:Controllable Generation from Pre-trained Language Models via Inverse Prompting
28 Adaptor
  1. adapter-transformers:Huggingface Transformers + Adapters = ❤️
29 Book and Course
  1. Natural Language Processing with PyTorch
  2. Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"
  3. MTBook:《机器翻译:统计建模与深度学习方法》
30 可视化
  1. Text Visualization Browser
31 最新研究进展
  1. track the progress in Natural Language Processing (NLP)
  2. Repository to show how NLP can tacke real problem.
  3. Leaderboards-for-Multi-Turn-Response-Selection:provide the reader with a quick overview of benchmark datasets and the state-of-the-art studies on this task, which serves as a stepping stone for further research.
  4. awesome-papers:Papers & presentation materials from Hugging Face's internal science day

ChatGPT

  1. awesome-chatgpt-prompts
  2. ChatGPT Prompt Generator

生成式AI

  1. generative-ai-for-beginners:Get Started Building with Generative AI

推荐系统

  1. faiss:A library for efficient similarity search and clustering of dense vectors. -- Linux
  2. StarSpace:Learning embeddings for classification, retrieval and ranking.
  3. pytorch-fm:Factorization Machine models in PyTorch
  4. tensorflow/recommenders: TensorFlow Recommenders is a library for building recommender system models using TensorFlow.
  5. NEWS-RECOMMENDATION:简单demo
  6. GrowNet:Gradient Boosting Neural Networks: GrowNet
  7. saleor: A modular, high performance, headless e-commerce platform built with Python, GraphQL, Django, and React. 电商平台工具

相似性

  1. milvus:An open source vector similarity search engine -- Linux
  2. faiss:A library for efficient similarity search and clustering of dense vectors. -- Linux
  3. annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk
  4. Scann:ScaNN (Scalable Nearest Neighbors) is a method for efficient vector similarity search at scale. lecture

语音

  1. ASRT_SpeechRecognition:A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
  2. speechT: An opensource speech-to-text software written in tensorflow
  3. TensorflowTTS:TensorflowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2
  4. DeepSpeechRecognition:A Chinese Deep Speech Recognition System 包括基于深度学习的声学模型和基于深度学习的语言模型
  5. audio-pretrained-model:A collection of Audio and Speech pre-trained models.
  6. pyannote-audio:Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
  7. Kersa-Speaker-Recognition:基于Kersa实现的声纹识别模型
  8. Transformer-TTS:Implementation of "FastSpeech: Fast, Robust and Controllable Text to Speech"
  9. kaldi:kaldi-asr/kaldi is the official location of the Kaldi project.
  10. espnet:End-to-End Speech Processing Toolkit
  11. MockingBird: AI拟声: 5秒内克隆您的声音并生成任意语音内容
  12. Real-Time-Voice-Cloning: Clone a voice in 5 seconds to generate arbitrary speech in real-time
  13. whisper:Speech Recognition via Large-Scale Weak Supervision
  14. bark: Text-Prompted Generative Audio Model

GAN

  1. GAN-ZOO:A list of all named GANs!
  2. The classical paper list with code about generative adversarial nets
  3. Curated list of awesome GAN applications and demo
  4. StarGAN
  5. iGAN: Interactive Image Generation via Generative Adversarial Networks
  6. CycleGAN and pix2pix in PyTorch image-to-image translation
  7. [U-GAT-IT用小姐姐自拍,生成二次元萌妹子,神情高度还原,反过来也可以](https://github.com/znxlwm/UGATIT-pytorch https://github.com/taki0112/UGATIT)
  8. SeqGANhttps://github.com/suragnair/seqGAN https://github.com/ChenChengKuan/SeqGAN_tensorflow
  9. deepgenerativemodels / notes
  10. sngan_projection:GANs with spectral normalization and projection discriminator
  11. gan:Tooling for GANs in TensorFlow
  12. AnimeGAN: AnimeGAN for fast photo animation !
  13. BigGAN
  14. first-order-model:图片动画化
  15. stargan-v2:StarGAN v2 - Official PyTorch Implementation (CVPR 2020)
  16. UGATIT-pytorch:风格转换. Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation
  17. 【book】O'Reilly book 'Generative Deep Learning'
  18. SeqGAN seqGAN-Simplified
  19. super-resolution: Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

CV

  1. CV-pretrained-model:A collection of computer vision pre-trained models.
  2. 腾讯优图开源项目
  3. Dlib:making real world machine learning and data analysis applications in C++
  4. mediapipe: Cross-platform, customizable ML solutions for live and streaming media. Google.
  5. Convolution arithmetic卷积算法可视化解释
  6. cnn-explainer:可视化cnn训练学习
  7. pytorch-image-models:PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more. 【NEW】
图像识别与分类
  1. DBoW2:Enhanced hierarchical bag-of-word library for C++
  2. face_classification: Real-time face detection and emotion/gender classification
  3. 人脸识别 :The world's simplest facial recognition api for Python and the command line
  4. insightface: State-of-the-art 2D and 3D Face Analysis Project.
  5. TF_FLAME:Example Tensorflow code for the FLAME face model
  6. DeepFaceLab_Colabhttps://www.deepfaker.xyz -- NOTE:With colab you can use tesla P100 for free. Of course there are some restrictions
  7. EasyOCR:Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
  8. libfacedetection:: face detection in images. The face detection speed can reach 1000FPS.
  9. PyMatting: A Python Library for Alpha Matting. 抠图
  10. rembg:Rembg is a tool to remove images background. 抠图
  11. TransFG:A Transformer Architecture for Fine-grained Recognition
  12. bottleneck-transformer-pytorch:SotA visual recognition model with convolution + attention that outperforms EfficientNet and DeiT in terms of performance-computes trade-off, in Pytorch
  13. deep-learning-for-image-processing:deep learning for image processing including classification and object-detection etc.
  14. DEKR:This is an official implementation of our CVPR 2021 paper "Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression" (https://arxiv.org/abs/2104.02300)
  15. mmpose:OpenMMLab Pose Estimation Toolbox and Benchmark.
  16. VNN:多种图片效果转换工具,高性能、轻量级神经网络部署框架。
  17. chineseocr_lite:超轻量级中文ocr,支持竖排文字识别, 支持ncnn、mnn、tnn推理。
  18. HyperLPR: 基于深度学习高性能中文车牌识别
  19. fawkes: 识别合成图像
  20. maeMasked Autoencoders Are Scalable Vision Learners
  21. ConvNeXtA ConvNet for the 2020s. CVPR 2022.
  22. ResNeSt: significantly boosts the performance of downstream models such as Mask R-CNN, Cascade R-CNN and DeepLabV3.
  23. Res2Net: Multi-scale Backbone Architecture
  24. CSWin-Transformer:Vision Transformer Backbone with Cross-Shaped, CVPR 2022
  25. deepface:A Lightweight Face Recognition and Facial Attribute Analysis (Age, Gender, Emotion and Race) Library for Python
opencv
  1. learnopencv: Learn OpenCV : C++ and Python Examples
openvino
  1. openvino_tensorflow: OpenVINO™ integration with TensorFlow. 简短介绍
目标检测
  1. Recent Advances in Deep Learning for Object Detection
  2. pytorch-YOLOv4:PyTorch ,ONNX and TensorRT implementation of YOLOv4
  3. tensorflow-yolov4-tflite:YOLOv4, YOLOv4-tiny, YOLOv3, YOLOv3-tiny Implemented in Tensorflow 2.0, Android. Convert YOLO v4 .weights tensorflow, tensorrt and tflite
  4. yolov5:YOLOv5 in PyTorch > ONNX > CoreML > iOS
  5. deep_learning_object_detection:A paper list of object detection using deep learning.
  6. deepdetect:Deep Learning API and Server in C++11 support for Caffe, Caffe2, PyTorch,TensorRT, Dlib, NCNN, Tensorflow, XGBoost and TSNE
  7. mmdetection:OpenMMLab Detection Toolbox and Benchmark
  8. FaceBoxes.PyTorch:A PyTorch Implementation of FaceBoxes
  9. openpose:OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
  10. Surface-Defect-Detection:open source dataset and important critical papers in the field of surface defect research
  11. ViT-pytorch: Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
  12. efficientnet-pytorch: PyTorch implementation of "EfficientNet", ICML 2019
  13. efficientnet: Implementation of EfficientNet model. Keras and TensorFlow Keras.
  14. FCOS: Fully Convolutional One-Stage Object Detection (ICCV'19)
  15. image-segmentation-keras:Implementation of Segnet, FCN, UNet , PSPNet and other models in Keras
  16. YOLOF:You Only Look One-level Feature (YOLOF), CVPR2021, Detectron2
  17. Ultra-Fast-Lane-Detection:Ultra Fast Structure-aware Deep Lane Detection (ECCV 2020)
  18. LSPS:Source code for "3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space"
  19. nanodet: ⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥
  20. U-2-Net:paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."
  21. segment-anything: running inference with the SegmentAnything Model (SAM)
  22. AnimatedDrawings
医疗
  1. InnerEye-DeepLearning: Medical Imaging Deep Learning library to train and deploy models on Azure Machine Learning and Azure Stack. intro
多模态
  1. microsoft/psi:an open, extensible framework for development and research of multimodal, integrative-AI systems. 【C#】
  2. ClipBERT:an efficient framework for end-to-end learning for image-text and video-text tasks.
  3. stable-diffusion:A latent text-to-image diffusion model.
  4. motion-diffusion-model:"Human Motion Diffusion Model"
  5. Chinese-CLIP: Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
  6. stable-diffusion-webui:web UI
图像高清化/风格转换/老化照片处理
  1. pulse:Self-Supervised Photo Upsampling via Latent Space Exploration of Generative Models
  2. FastPhotoStyle:分割不同部分的转换,Nice output
  3. DeOldify:A Deep Learning based project for colorizing and restoring old images
  4. 图片修复
  5. Code and data for paper "Deep Photo Style Transfer"
  6. TensorFlow CNN for fast style transfer
  7. ALAE: Adversarial Latent Autoencoders
  8. deep-daze:Simple command line tool for text to image generation using OpenAI's CLIP and Siren
  9. AnimeGANv2:The improved version of AnimeGAN. Landscape photos/videos to anime
  10. Real-ESRGAN:Practical Algorithms for General Image/Video Restoration.
  11. avatarify-python:摄像头实时风格转换
  12. tiler: 👷 Build images with images. 非深度学习
  13. pixray:neural image generation. 生成pixel风格图像
  14. text2art:AI-powered Text-to-Art Generator - Text2Art.com
  15. latent-diffusion: High-Resolution Image Synthesis with Latent Diffusion Models
  16. dalle-mini: Generate images from a text prompt
  17. paper2gui: 简单方便的使用前沿人工智能技术
  18. style2paints:sketch + style = paints 🎨
数据增强
  1. fast-autoaugment:Official Implementation of 'Fast AutoAugment' in PyTorch.
  2. zao-:AI技术换脸源码
  3. AutoAugment:Unofficial implementation of the ImageNet, CIFAR 10 and SVHN Augmentation Policies learned by AutoAugment using pillow
  4. albumentations: Fast image augmentation library and easy to use wrapper around other libraries
  5. imgaug:Image augmentation for machine learning experiments.
  6. AugLy:A data augmentations library for audio, image, text, and video.
  7. ttach:Test Time Augmentation with PyTorch

Reinforcement Learning

  1. 算法、讲义、练习:Implementation of Reinforcement Learning Algorithms
  2. RLexample:basic examples of playing with RL
  3. DeepRL-Tutorials:Contains high quality implementations of Deep Reinforcement Learning algorithms written in PyTorch
  4. ierg5350-assignment:assignments of our reinforcement learning (RL) course.
  5. TD3:Author's PyTorch implementation of TD3 for OpenAI gym tasks. Nice code style and quality👍
  6. baby-a3c: A high-performance Atari A3C agent in 180 lines of PyTorch
  7. pytorch-a2c-ppo-acktr-gail: PyTorch implementation of Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO), Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR) and Generative Adversarial Imitation Learning (GAIL).
  8. NeuronDance / DeepRL: Deep Reinforcement Learning Lab
  9. awesome-monte-carlo-tree-search-papers
  10. advanced-deep-learning-and-reinforcement-learning-deepmind:UCL & DeepMind | YouTube videos 👉 https://www.youtube.com/playlist?list…
  11. AlphaZero_Gomoku:An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
  12. reinforcement-learning-an-introduction:强化学习导论配套代码库
  13. PARL:PARL A high-performance distributed training framework for Reinforcement Learning
  14. trfl:TensorFlow Reinforcement Learning
  15. tianshou:An elegant PyTorch deep reinforcement learning library.

知识图谱

  1. 农业知识图谱(AgriKG)
  2. KGQA-Based-On-medicine
  3. KEQA_WSDM19
  4. transE
  5. KB2E:thunlp
  6. tianchi_nl2sql : 首届中文NL2SQL挑战赛决赛第3名方案+代码
  7. CCKS 2019 中文知识图谱问答数据集
  8. knowledge-graph: a QA Demo based on KG! use scrapy and jena.
  9. ONEPIECE-KG: a knowledge graph project for ONEPIECE /《海贼王》知识图谱
  10. K-BERT:Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph"
  11. scikit-kge:Python library to compute knowledge graph embeddings
  12. Financial-Knowledge-Graphs:小型金融知识图谱构建流程
  13. KG-demo-for-movie:从无到有构建一个电影知识图谱,并基于该KG,开发一个简易的KBQA程序
  14. pykg2vec:Python library for knowledge graph embedding and representation learning.
  15. text_to_knowledge: 解语(Text to Knowledge)是首个覆盖中文全词类的知识库(百科知识树)及知识标注框架,拥有可描述所有中文词汇的词类体系、中文知识标注工具集,以及更适用于中文挖掘任务的预训练语言模型。paddlenlp子项目,没有开源。
  16. graph-data-science:Neo4j Graph Data Science library of graph algorithms.

深度贝叶斯/概率

  1. Deep universal probabilistic programming with Python and PyTorch](https://github.com/pyro-ppl/pyro)
  2. Python library for probabilistic modeling, inference, and criticism
  3. A Library for Bayesian Deep Learning, Generative Models, Based on Tensorflow

Capsule Net

  1. CapsNet-Tensorflow
  2. CapsNet-resource

自动驾驶

  1. Udacity Self-Driving Car Engineer Nanodegree projects.
  2. MIT Deep Self Driving

机器人

  1. AtsushiSakai / PythonRobotics

Contrastive Learning

  1. PyContrast:PyTorch implementation of Contrastive Learning methods; List of awesome-contrastive-learning papers
  2. SimCSE:SimCSE: Simple Contrastive Learning of Sentence Embeddings
  3. ConSERT: ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer
  4. pytorch-metric-learning: The easiest way to use deep metric learning in your application. 可直接使用的 NTXENT loss (InfoNCE) ,SupContrast loss等对比学习损失。

Adversarial Attack

  1. adversarial-robustness-toolbox
  2. foolbox:fool neural networks
  3. cleverhans:constructing attacks, building defenses, and benchmarking both
  4. FreeLB:Adversarial Training for Natural Language Understanding
  5. DefenseByAttack: Code for paper "One Man's Trash is Another Man's Treasure"

Multi-Task Learning

  1. fudan_mtl_reviews:TensorFlow implementation of the paper Adversarial Multi-task Learning for Text Classification
  2. https://decanlp.com/: The Natural Language Decathlon (decaNLP) is a new benchmark for studying general NLP models that can perform a variety of complex, natural language tasks. decaNLP

联邦学习

  1. FedML-AI/FedML:(PyTorch > 1.0) A Research-Oriented Federated Learning Library. Supporting distributed computing, mobile/IoT on-device training, and standalone simulation. Intro

图网络

  1. dgl:Python package built to ease deep learning on graph, on top of existing DL frameworks.
  2. AutoGL:An autoML framework & toolkit for machine learning on graphs.
  3. graph_nets:Build Graph Nets in Tensorflow
  4. A collection of important graph(Code)
  5. Must-read papers on graph neural networks (GNN)
  6. microsoft/tf-gnn-samples
  7. spektral:Graph Neural Networks with Keras and Tensorflow 2.
  8. littleballoffur:A NetworkX extension library for graph subsampling.
  9. awesome-gcn: resources for graph convolutional networks (图卷积神经网络相关资源)
  10. pytorch_geometric:Geometric Deep Learning Extension Library for PyTorch https://pytorch-geometric.readthedocs…
  11. PytorchGeometricTutorial:Pytorch Geometric Tutorials

优化算法

  1. RAdam

框架实践

tuning_playbook: A playbook for systematically maximizing the performance of deep learning models.

  1. DeepLearningExamples:Deep Learning Examples
  2. jax:Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU
  3. PyCandle:A numpy and cpu based neural network tool. For those who intend to learn more about the details of how a neural network works.
  4. 【**】einops:Deep learning operations reinvented (for pytorch, tensorflow, chainer, gluon and others)。还在为tensor维度变化操作的语法发愁吗?试试这个说人话的package
  5. tinygrad:You like pytorch? You like micrograd? You love tinygrad! ❤️
  6. MatrixSlow:A simple deep learning framework in pure python for purpose of learning in DL
  7. best-of-ml-python:🏆 A ranked list of awesome machine learning Python libraries.
  8. pytorch-loss:label-smooth, amsoftmax, focal-loss, triplet-loss, lovasz-softmax ...
  9. pytorch-optimizer:torch-optimizer -- collection of optimizers for Pytorch
  10. prefect: The easiest way to coordinate your dataflow
  11. KuiperInfer: 一个推理库的实现, A DIY deep learning inference framework.

Tensorflow

  1. Effective TensorFlow
  2. 简明TF2 tensorflow2中文教程
  3. eat_tensorflow2_in_30_days
  4. deeplearning-models: various deep learning architectures, models, and tips
  5. TensorFlow实战书codes
  6. Deep Learning with Python Keras
  7. Hands-on Machine Learning with Scikit-Learn and TensorFlow
  8. Neural Machine Translation (seq2seq) Tutorial
  9. TFLearn: Deep learning library featuring a higher-level API for TensorFlow.
  10. Tensor2Tensor:deep learning models and datasets designed to make deep learning more accessible and accelerate ML research
  11. Sonnet: is a library built on top of TensorFlow for building complex neural networks.
  12. KDD2019 Deep Learning for NLP with Tensorflow hands-on
  13. TensorFlow Tutorial and Examples for Beginners (support TF v1 & v2)
  14. 简单粗暴 TensorFlow 2.0
  15. TensorFlow for Deep Learning Research. 课件
  16. tensorpack:A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
  17. larq:An Open-Source Library for Training Binarized Neural Networks
  18. addons:Useful extra functionality for TensorFlow 2.x maintained by SIG-addons.
  19. keras-cosine-annealing
C++
  1. hello_tf_c_api:Neural Network TensorFlow C API
  2. tensorflow_cc:Build and install TensorFlow C++ API library.
  3. tiny-dnn:header only, dependency-free deep learning framework in C++14

Pytorch

  1. Fairseq(-py) is a sequence modeling toolkit
  2. A practical approach to machine learning pytorch
  3. pix2pixHD
  4. 实战Deep Architectures PyTorch:ppt
  5. 动手学深度学习Pytorch版
  6. A very simple framework for state-of-the-art Natural Language Processing (NLP)
  7. pytorch-lightning:pytorch + TPU
  8. Awesome-pytorch-list:A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
  9. DeepNLP-models-Pytorch:Pytorch implementations of various Deep NLP models in cs-224n(Standford Univ)
  10. pytorch-Deep-Learning: Deep Learning (with PyTorch)
  11. serve:Model Serving on PyTorch
  12. pytorch-seq2seq:Tutorials on implementing a few sequence-to-sequence (seq2seq) models with PyTorch and TorchText.
  13. pycandle:PyCandle is a lightweight library for pytorch that makes running experiments easy, structured, repeatable and avoids boilerplate code.
  14. AI-Art:PyTorch implementation of Neural Style Transfer, Pix2Pix, CycleGAN, and Deep Dream!
  15. entmax:The entmax mapping and its loss, a family of sparse softmax alternatives.
  16. Adabelief-Optimizer:NeurIPS 2020 Spotlight "AdaBelief Optimizer: Adapting stepsizes by the belief in observed gradients"
  17. pytorch-optimizer:torch-optimizer -- collection of optimizers for Pytorch
  18. examples: A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
  19. computervision-recipes:Best Practices, code samples, and documentation for Computer Vision.
  20. nlp-recipes:Natural Language Processing Best Practices & Examples
  21. pymde:Python library for computing vector embeddings for finite sets of items, such as images, biological cells, nodes in a network
  22. mmf:MMF is a modular framework for vision and language multimodal research from Facebook AI Research.
  23. examples: A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
  24. pytorch-image-models: PyTorch image models, scripts, pretrained weights -- (SE)ResNet/ResNeXT, DPN, EfficientNet, MixNet, MobileNet-V3/V2, MNASNet, Single-Path NAS, FBNet, and more
  25. pytorch-cosine-annealing-with-warmup
  26. fastmoe:A fast Mixture of Experts(MoE) impl for PyTorch
  27. annotated_deep_learning_paper_implementations
  28. External-Attention-pytorch:🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.
  29. torch-toolbox:ToolBox to make using Pytorch much easier. Mainly for CV.
  30. tianshou:An elegant PyTorch deep reinforcement learning library.

MxNet

  1. 动手学深度学习
  2. autogluon:AutoGluon: AutoML for Text, Image, and Tabular Data

Spark

  1. sparkflow:Easy to use library to bring Tensorflow on Apache Spark
  2. spark-nlp:State of the Art Natural Language Processing
  3. spark-deep-learning:Deep Learning Pipelines for Apache Spark
  4. Data-Science-with-Spark

Ray

  1. ray:A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library. 【actor模式】

Lasagne

  1. Lasagne:Lasagne is a lightweight library to build and train neural networks in Theano.

MindSpore

  1. mindsporehttps://www.mindspore.cn/docs/zh-CN/master/architecture.html

MegEngine

  1. MegEngine: https://megengine.org.cn/quick-start/

基础

  1. fastAI作业

模型优化

  1. NVIDIA TensorRT doc GitHub
  2. hyperopt / hyperopt:超参数调整
  3. hyperparameter_hunter:Automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests.
  4. microsoft / DeepSpeed:library that makes distributed training easy, efficient, and effective.
    • TDS: A plug-in of Microsoft DeepSpeed to fix the bug of DeepSpeed pipeline
  5. apex:A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
  6. model-optimization:A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.
  7. keras-tuner:Hyperparameter tuning for humans
  8. optuna:A hyperparameter optimization framework
  9. lightseq:LightSeq: A High Performance Library for Sequence Processing and Generation
  10. nni:An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
  11. AutoGluon:AutoML for Text, Image, and Tabular Data.
  12. BMInf:Efficient Inference for Big Models
  13. InfMoE: Inference framework for MoE(Mixture of Experts) layers based on TensorRT with Python binding
  14. fastmoe: A fast MoE impl for PyTorch

模型训练部署

机器学习系统设计

部署

  1. BentoML:Model Serving Made Easy

  2. Turi Create simplifies the development of custom machine learning models

  3. cortexlabs / cortex:模型部署 【相关项目:cortex: A horizontally scalable, highly available, multi-tenant, long term Prometheus.

  4. bert-classification-tf-serving

  5. Deep-Learning-in-Production:deploying deep learning-based models in production.

  6. ***ray:A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library. Serve Video Documentsimage-20200707121042390

    image-20200707122147905 process to process 跳过调度,增加性能

  7. fiber:Distributed Computing for AI Made Simple

  8. model_deployment:A collection of model deployment library and technique.

  9. jina:An easier way to build **neural search applications ** in the cloud

  10. plaidml:PlaidML is a framework for making deep learning work everywhere.

  11. streamlit:Streamlit — The fastest way to build data apps in Python

  12. kubeflow: Machine Learning Toolkit for Kubernetes

  13. waitress:Waitress - A WSGI server for Python 2 and 3

  14. mlflow: Open source platform for the machine learning lifecycle

  15. zenml:Bring Zen to your ML with reproducible pipelines

  16. cs329s-ml-deployment-tutorial:Code and files to go along with CS329s machine learning model deployment tutorial.

  17. nboost:deploying transformer models to improve the relevance of search results on different platforms (i.e. Elasticsearch)

  18. object_detector_app:Real-Time Object Recognition App with Tensorflow and OpenCV

  19. uwsgi-nginx-flask-docker:Docker image with uWSGI and Nginx for Flask applications in Python running in a single container. Optionally with Alpine Linux.

  20. Modin: Speed up your Pandas workflows by changing a single line of code

  21. nameko:Python framework for building microservices

  22. metaflow:Build and manage real-life data science projects with ease.

  23. NLPStreamlit:简易部署 Sentiment Analysis, Named Entity Recognition (NER), and Text Summarization 等网页应用。

  24. TinyNeuralNetwork:compression framework.

  25. nonebot2: 跨平台 Python 异步聊天机器人框架

  26. gradio: 用 Python 为模型创建演示界面。Create UIs for your machine learning model in Python in 3 minutes.

  27. cog:Containers for machine learning

  28. mercury: Build Web Apps in Jupyter Notebook with Python only

机器学习编译

  1. d2l-tvm:Dive into Deep Learning Compiler

CUDA

  1. YOLO_TRT_SIM:高效部署:YOLO X, V3, V4, V5, V6, V7, V8, EdgeYOLO TRT推理 ,前后处理均由CUDA核函数实现 CPP/CUDA

训练

  1. ray:A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library
  2. ml-agents ::Unity Machine Learning Agents Toolkit 训练游戏AI
  3. BytePS:A high performance and general PS framework for distributed training
  4. Horovod:The goal of Horovod is to make distributed Deep Learning fast and easy to use
  5. cml:Continuous Machine Learning | CI/CD for ML,结果组织成网页分析
  6. fairscale:PyTorch extensions for high performance and large scale training.
  7. replicate:Version control for machine learning
  8. orchest:Orchest is a tool for creating data science pipelines.
  9. trains:Auto-Magical Experiment Manager & Version Control for AI
  10. cleverhans:An adversarial example library for constructing attacks, building defenses, and benchmarking both
  11. docker-python:Kaggle Python docker image
  12. distribuuuu:The pure and clear PyTorch Distributed Training Framework.
  13. ColossalAI:A Unified Deep Learning System for Large-Scale Parallel Training
  14. maggot:A lightweight python library that helps to keep track of numerical experiments
  15. rigl:End-to-end training of sparse deep neural networks with little-to-no performance loss. "Making All Tickets Winners"
  16. skweak: A software toolkit for weak supervision applied to NLP tasks
  17. AugLy:A data augmentations library for audio, image, text, and video.
  18. pytorch-balanced-sampler: under/over sample according to a chosen parameter alpha, in order to create a balanced training distribution.
  19. pytorch-balanced-batch

Transfer Learning

  1. Hands-On Transfer Learning with Python

多任务

  1. multi-task-learning-example
  2. mt-dnn:Multi-Task Deep Neural Networks for Natural Language Understanding

Awesome

  1. awesome: 全,Awesome lists about all kinds of interesting topics
  2. Awesome-win: An awesome & curated list of best applications and tools for Windows.
  3. awesome-tensorflow:TensorFlow - A curated list of dedicated resources
  4. awesome-deep-learning:A curated list of awesome Deep Learning tutorials, projects and communities.
  5. awesome-nlp:A curated list of resources dedicated to Natural Language Processing (NLP)
  6. awesome-docker:A curated list of Docker resources and projects
  7. Awesome-pytorch-list:A comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
  8. Awesome-Chinese-NLP:中文自然语言处理相关资料
  9. open_model_zoo:Pre-trained Deep Learning models and demos (high quality and extremely fast)
  10. https://modelzoo.co/
  11. awesome-bots:The most awesome list about bots
  12. Awesome-Chatbot:Awesome Chatbot Projects,Corpus,Papers,Tutorials.Chinese Chatbot =>:
  13. awesome-mlops: A curated list of references for MLOps, 机器学习开发周期教程、视频、博客
  14. google-research / language:Shared repository for open-sourced projects from the Google AI Language team.
  15. awesome-relation-extraction:A curated list of awesome resources dedicated to Relation Extraction
  16. awesome-grounding:A curated list of research papers in grounding.
  17. awesome-automl-papers:automated machine learning papers, articles, tutorials, slides and projects

会议资源

  1. KDD2019 Hands-on Tutorials

项目idea

  1. Machine Learning, NLP, Vision, Recommender Systems Project Ideas
  2. 数据竞赛Top解决方案开源整理
  3. Voice Conversion with Non-Parallel Data
  4. 语音转文字wave-net
  5. A TensorFlow implementation of Baidu's DeepSpeech architecture
  6. industry-machine-learning:A curated list of applied machine learning and data science notebooks and libraries across different industries.
  7. news-search-engine:新闻搜索引擎

3 机器学习

  1. awesome-mlops: A curated list of references for MLOps, 机器学习开发周期教程、视频、博客

  2. Elements-of-Mathematics:从加减乘除到机器学习

  3. The-Art-of-Linear-Algebra:Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

开源工具

算法包

  1. H2O documentation:H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization of those models in an enterprise environment.
  2. vowpal_wabbit:a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.
  3. XGBoost repository
  4. LightGBM repository
  5. cvxpy:A Python-embedded modeling language for convex optimization problems.
  6. tpot:A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
  7. production-tools:演示如何为数据科学项目设置工具的基本存储库,这些工具将帮助您编写更高质量的代码。
  8. scikit-multiflow:A machine learning package for streaming data in Python. 流式数据输入进行训练
  9. igel:a delightful machine learning tool that allows to train, test and use models without writing code.
  10. creme:Online machine learning in Python
  11. statsmodels:statistical modeling and econometrics in Python
  12. xlearn:High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI
  13. GPy:Gaussian processes framework in python
  14. mlens:ML-Ensemble – high performance ensemble learning
  15. Kats:[时序]analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis
  16. modin: 比pandas更快,且接口基本相同

聚类

  1. spherecluster:Clustering routines for the unit sphere blog
  2. brown-clustering:Brown clustering in Python. 词聚类
  3. lsystem_optimization:Socially isolating through obsessive micro-optimization.

特征

  1. cleanlab:Find label errors in datasets, weak supervision, and learning with noisy labels.
  2. boruta_py:Python implementations of the Boruta all-relevant feature selection method.
  3. *dabl:Data Analysis Baseline Library !Document
  4. mglearn:绘图函数包
  5. umap:Uniform Manifold Approximation and Projection,like t-sne
  6. geomstats:About Computations and statistics on manifolds with geometric structures.
  7. FEATHER
  8. Mito:像Excel一些样操作 pandas dataframe
  9. Pandas Profiling:一键分析中小型 pandas dataframe
  10. Lux:自动推荐探索性数据分析的图表选择。
  11. pycaret:An open-source, low-code machine learning library in Python. 快速搭建基础模型,筛选特征

科学计算

  1. scikit-geometry:Scientific Python Geometric Algorithms Library
  2. f2py import Fortran code in Python
  3. awkward-array:Manipulate arrays of complex data structures as easily as Numpy. Example
  4. boost-histogram:Python bindings for the C++14 Boost::Histogram library
  5. cusignal:cuSignal - RAPIDS Signal Processing Library
  6. scorep_binding_python:Allows tracing of python code using Score-P
  7. scikit-opt: 传统优化算法等 Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP

GPU加速

  1. jax:Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more(XLA based)
  2. numba:比jax更复杂NumPy aware dynamic Python compiler using LLVM

算法实现

  1. Minimal and clean examples of machine learning algorithms implementations
  2. ML-From-Scratch:Machine Learning From Scratch.
  3. Machine learning, in numpy 全numpy实现
  4. 统计学习方法
  5. 统计学习方法:An-Introduction-to-Statistical-Learning:官方版本
  6. 概率模型:变分推断、GAN、MC等等 Python library for probabilistic modeling, inference, and criticism
  7. PRML algorithms implemented in Python
  8. finite-state toolkit--FST
  9. Machine learning: a probabilistic perspective
  10. TGBoost
  11. Arbitrary order factorization machines:TensorFlow implementation of an arbitrary order Factorization Machine
  12. SpectralNet:Deep network that performs spectral clustering 【聚类】
  13. Deep universal probabilistic programming with Python and PyTorch
  14. An-Introduction-to-Statistical-Learning:This repository contains the exercises and its solution contained in the book "An Introduction to Statistical Learning" in python.
  15. hmmlearn:Hidden Markov Models in Python, with scikit-learn like API

安全机器学习

  1. OpenMined / PySyft:A library for encrypted, privacy preserving machine learning
  2. FATE: An Industrial Level Federated Learning Framework

AutoML

  1. AlphaPy:Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost
  2. nni:An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
  3. AutoGluon:AutoML for Text, Image, and Tabular Data.

可解释机器学习

  1. interpretable-ml-book
  2. shapash:🔅 Shapash makes Machine Learning models transparent and understandable by everyone

实用资料/调参工具

  1. Machine Learning Cheatsheet
  2. hyperparameter_hunter:Automatically save and learn from Experiment results, leading to long-term, persistent optimization that remembers all your tests.
  3. hyperopt / hyperopt:超参数调整

4 比赛方案

  1. TNSCUI2020-Seg-Rank1st:图像分割
  2. CCKS2019_EventEntityExtraction_Rank5:SEBERTNets:一种面向金融领域的事件主体抽取方法
  3. lic2019-dureader2.0-rank2:Rank2 solution (no-BERT) for 2019 Language and Intelligence Challenge - DuReader2.0 Machine Reading Comprehension.
  4. Tencent2019_Finals_Rank1st:2019腾讯广告算法大赛完整代码(冠军)
  5. Tencent2020_Rank1st:The code for 2020 Tencent College Algorithm Contest, and the online result ranks 1st.
  6. riiid-acp-pub 3rd:Riiid! Answer Correctness Predction 3rd place solution. 复现需要较好的机器配置。
  7. quest_qa_labeling1st:Google QUEST Q&A Labeling. Improving automated understanding of complex question answer content
  8. gaic_track3_pair_sim:全球人工智能技术创新大赛-短文本语义匹配--冠军方案
  9. ccks_baidu_entity_link:ccks baidu entity link 实体链接 第一名
  10. daguancup_-5th:第五届“达观杯” 基于大规模预训练模型的风险事件标签识别比赛,初赛A榜第四,最终排名第六。只用了单模nezha。

比赛信息

  1. MLCompetitionHub:机器学习竞赛信息

5 开源工具

ChatGPT: 🔮 ChatGPT Desktop Application

ChatGPT - Poe:在线免费chatgpt

字体

  1. Nerd Fonts Website nerd-fonts github
  2. powerline/fonts

可视化

  1. Tool for visualizing attention in the Transformer model
  2. tqdm
  3. Visualizations for machine learning datasets
  4. manim:Animation engine for explanatory math videos
  5. diagrams: Diagram as Code for prototyping cloud system architectures,用代码画架构图
  6. dl-visualization:This is a repository containing the source code for the animations to the series "Visualizing Deep Learning" on the YouTube channel vcubingx.
  7. weibo-analysis-and-visualization

系统工具

  1. tldr:Simplified and community-driven man pages
  2. ShellCheck, a static analysis tool for shell scripts
  3. [Git] Git的奇技淫巧
  4. pure-bash-bible:bash脚本使用指南
  5. [C++] Windows Terminal
  6. awesome window manager
  7. memory-profiler pip install memory-profiler
  8. code-server:VS Code in the browser https://coder.com
  9. gen_tags.vim:ctags增强
  10. batbat supports syntax highlighting for a large number of programming and markup languages
  11. records: SQL for Humans in Python. Database support includes RedShift, Postgres, MySQL, SQLite, Oracle, and MS-SQL (drivers not included).
  12. miasm:Reverse engineering framework in Python
  13. termpair: 使用浏览器,远程连接服务器terminal。
  14. asynctasks.vim:Modern Task System for Project Building, Testing and Deploying !!
  15. TencentOS-tiny:物联网终端操作系统
  16. nginx-tutorial:Nginx 极简教程
  17. cockpit: a web-based graphical interface for servers.
  18. changedetection.io: self-hosted free open source website change detection, monitor and notification service.
  19. Ripes: A graphical processor simulator and assembly editor for the RISC-V ISA
  20. gitui: Blazing 💥 fast terminal-ui for git
  21. ecapture: capture SSL/TLS text content without CA cert using eBPF.
  22. Bottles:Run Windows software and games on Linux
  23. ansible: Ansible is a radically simple IT automation platform that makes your applications and systems easier to deploy and maintain.

小项目

  1. 一个基于离线唤醒,自然语言理解和情感分析的开源自然交互系统
  2. 微信助手:1.每日定时给好友发送定制消息
  3. latest research results by crawling arxiv papers and summarizing abstracts.
  4. nider:Python package to add text to images, textures and different backgrounds
  5. docusaurus:Easy to maintain open source documentation websites. https://docusaurus.io
  6. Synonyms:中文近义词:聊天机器人,智能问答工具包
  7. tushare: TuShare is a utility for crawling historical data of China stocks
  8. h5-Dooring:简单方便、专业可靠、无限可能的H5/PC页面制作解决方案.
  9. pytheory: 学习音乐理论
  10. live2d-widget:前端页面自定义看板娘
  11. Screenshot-to-code:静态网页代码生成
  12. latexify_py:Generates LaTeX math description from Python functions.
  13. sherlock: Hunt down social media accounts by username across social networks
  14. core:Open source home automation that puts local control and privacy first
  15. wttr.in:⛅ The right way to check the weather in command line.
  16. typora-plugin-bilibili:Typora粘贴图片自动上传到Bilibili图床,也可以自定义修改成任意其他图床接口。
  17. DouZero_For_HappyDouDiZhu:基于DouZero定制AI实战欢乐斗地主
  18. Unlock-netease-cloud-music:解锁网易云音乐客户端变灰歌曲
  19. FileCentipede:下载工具
  20. organicmaps:🍃 Organic Maps is a free Android & iOS offline maps app for travelers, tourists, hikers, and cyclists.

底层编译架构

  1. MLIR:"Multi-Level Intermediate Representation" Compiler Infrastructure
  2. The LLVM Project is a collection of modular and reusable compiler and toolchain technologies

并行计算

  1. dask / dask: Parallel computing with task scheduling

测试工具

  1. cypress-io / cypress:Fast, easy and reliable testing for anything that runs in a browser.
  2. vscode-recipes:A collection of recipes for using VS Code with particular technologies.

6 数据集

工具包

处理中文

  1. python-pinyin:汉字转拼音(pypinyin)
  2. OpenCC: Conversion between Traditional and Simplified Chinese
  3. zhon:Zhon is a Python library that provides constants commonly used in Chinese text processing.

NLP

  1. 中文自然语言处理数据集
  2. 100+ Chinese Word Vectors 上百种预训练中文词向量
  3. glyph-aware-character-embedding:区别西文字母不同样式的vec
  4. TX-WORD2VEC-SMALL:腾讯word2vec模型缩小版
  5. Fasttext
  6. laserembeddings:LASER multilingual sentence embeddings as a pip package
  7. 中文 自然语言处理 语料/数据集
  8. nlp_chinese_corpus: 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
  9. WEIBO_USER_DATA:收集了20W新浪微博用户的数据
  10. 中文NLP.数据集搜索:https://www.cluebenchmarks.com/dataSet_search.html
  11. toutiao-multilevel-text-classfication-dataset:今日头条中文新闻文本(多层)分类数据集
  12. chinese_chatbot_corpus:中文公开聊天语料库
  13. chinese-xinhua:中华新华字典数据库。包括歇后语,成语,词语,汉字。
  14. zhvoice:中文语音语料,语音更加清晰自然,包含8个开源数据集,3200个说话人,900小时语音,1300万字。
  15. CognitiveInference:认知推理、常识知识库、常识推理与常识推理评估的系统项目
  16. ChineseNlpCorpus-1:搜集、整理、发布 中文 自然语言处理 语料/数据集
  17. modern-poetry:最全的**近现代诗以及外国诗数据库
  18. Poetry:非常全的古诗词数据,收录了从先秦到现代的共计85万余首古诗词。
  19. ChineseLyrics:10W首中文歌词数据库
  20. poetry:china ancient poetry project data
  21. PersonGraphDataSet:人物图谱数据集,近十万的人物关系图谱事实数据库
  22. chinese-poetry:最全中华古诗词数据库, 唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人,21050首词。
  23. tang_poetry:全唐诗数据库

标注工具

  1. LabelMeAnnotationTool
  2. Image Polygonal Annotation with Python
  3. LabelImg is a graphical image annotation tool and label object bounding boxes in images
  4. ChineseAnnotator:中文自然语言处理 (NLP) 标注工具
  5. label-studio: a multi-type data labeling and annotation tool with standardized output format.
  6. doccano:Open source annotation tool for machine learning practitioners

图书

  1. singgel / Study-Floder
  2. CS-Books:超过1000本的计算机经典书籍

7 Blogs + 面经

  1. frankmcsherry / blog
  2. Bert-for-Chinese-NLP
  3. 公众号文章小集
  4. ML-NLP:机器学习(Machine Learning)、深度学习(Deep Learning)、NLP面试中常考到的知识点和代码实现
  5. 深度学习500问
  6. Reflection_Summary:算法理论基础知识
  7. machinelearning: blogs for machine learning
  8. Pre-trained-Models:NLP预训练模型的总结Blog
  9. NLP-Interview-Notes
  10. Tech Interview Guide 技术面试必备基础知识
  11. ML-Interview
  12. C-background-development-interview-experience
  13. LogicStack-LeetCode:LeetCode 系列文章

附 git保持clone文件同步:

git checkout master   # ensure you are on the main "master" branch
# git stash             # reset any changes you have made, !!!NOTICE!!!
git pull              # pull the latest versions