nlp-machine-learning
There are 6858 repositories under nlp-machine-learning topic.
deeppavlov/DeepPavlov
An open source library for deep learning end-to-end dialog systems and chatbots.
katanaml/sparrow
Structured data extraction and instruction calling with ML, LLM and Vision LLM
thunlp/OpenPrompt
An Open-Source Framework for Prompt-Learning.
esbatmop/MNBVC
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
cbamls/AI_Tutorial
精选机器学习,NLP,图像识别, 深度学习等人工智能领域学习资料,搜索,推荐,广告系统架构及算法技术资料整理。算法大牛笔记汇总
github/CodeSearchNet
Datasets, tools, and benchmarks for representation learning of code.
kk7nc/Text_Classification
Text Classification Algorithms: A Survey
changyeyu/LLM-RL-Visualized
🌟100+ 原创 LLM / RL 原理图📚,《大模型算法》作者巨献!💥(100+ LLM/RL Algorithm Maps )
chrismattmann/tika-python
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
DengBoCong/nlp-paper
自然语言处理领域下的相关论文(附阅读笔记),复现模型以及数据处理等(代码含TensorFlow和PyTorch两版本)
pemistahl/lingua-go
The most accurate natural language detection library for Go, suitable for short text and mixed-language text
MilaNLProc/contextualized-topic-models
A python package to run contextualized topic modeling. CTMs combine contextualized embeddings (e.g., BERT) with topic models to get coherent topics. Published at EACL and ACL 2021 (Bianchi et al.).
google-research/tapas
End-to-end neural table-text understanding models.
veekaybee/what_are_embeddings
A deep dive into embeddings starting from fundamentals
pemistahl/lingua-rs
The most accurate natural language detection library for Rust, suitable for short text and mixed-language text
ggeop/Python-ai-assistant
Python AI assistant 🧠
paschmann/rasa-ui
Rasa UI is a frontend for the Rasa Framework
NorskRegnesentral/skweak
skweak: A software toolkit for weak supervision applied to NLP tasks
georgian-io/LLM-Finetuning-Toolkit
Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.
bin123apple/AutoCoder
We introduced a new model designed for the Code generation task. Its test accuracy on the HumanEval base dataset surpasses that of GPT-4 Turbo (April 2024) and GPT-4o.
amanjeetsahu/Natural-Language-Processing-Specialization
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera
aniketpotabatti/Data-Science-EBooks
Welcome to the Data Science EBooks repository! This collection offers a variety of high-quality ebooks on Data Science, Machine Learning, and AI. Perfect for both beginners and advanced learners, explore these resources to deepen your knowledge and skills.
pemistahl/lingua
The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
mila-iqia/babyai
BabyAI platform. A testbed for training agents to understand and execute language commands.
michaelthwan/searchGPT
Grounded search engine (i.e. with source reference) based on LLM / ChatGPT / OpenAI API. It supports web search, file content search etc.
howl-anderson/Chinese_models_for_SpaCy
SpaCy 中文模型 | Models for SpaCy that support Chinese
mims-harvard/PrimeKG
Precision Medicine Knowledge Graph (PrimeKG)
namuan/dr-doc-search
Converse with book - Built with GPT-3
google-research-datasets/dstc8-schema-guided-dialogue
The Schema-Guided Dialogue Dataset
stepthom/text_mining_resources
Resources for learning about Text Mining and Natural Language Processing
hb20007/hands-on-nltk-tutorial
The hands-on NLTK tutorial for NLP in Python
lpty/nlp_base
自然语言基础模型
laugustyniak/awesome-sentiment-analysis
Repository with all what is necessary for sentiment analysis and related areas
yinizhilian/ICLR2025-Papers-with-Code
历年ICLR论文和开源项目合集,包含ICLR2021、ICLR2022、ICLR2023、ICLR2024、ICLR2025.
patrickjohncyh/fashion-clip
FashionCLIP is a CLIP-like model fine-tuned for the fashion domain.
soulbliss/NLP-conference-compendium
Compendium of the resources available from top NLP conferences.