nlp-resources
There are 129 repositories under nlp-resources topic.
juand-r/entity-recognition-datasets
A collection of corpora for named entity recognition (NER) and entity recognition tasks. These annotated datasets cover a variety of languages, domains and entity types.
HKUSTDial/NL2SQL_Handbook
[TKDE'25] This is a continuously updated handbook for readers to easily track the latest Text-to-SQL techniques in the literature and provide practical guidance for researchers and practitioners. Official repo for A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?
neuralmind-ai/portuguese-bert
Portuguese pre-trained BERT models
hb20007/hands-on-nltk-tutorial
The hands-on NLTK tutorial for NLP in Python
gkiril/oie-resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
guhhhhaa/4675-scifi
chinese NLP corpus of chinese science fiction,chinese science fiction corpus : About 4675 Chinese science fiction novels 大约有4675本科幻小说,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
ElizaLo/NLP-Natural-Language-Processing
Projects and useful articles / links
gutfeeling/beginner_nlp
A curated list of beginner resources in Natural Language Processing
Koziev/NLP_Datasets
My NLP datasets for Russian language
microsoft/vert-papers
This repository contains code and datasets related to entity/knowledge papers from the VERT (Versatile Entity Recognition & disambiguation Toolkit) project, by the Knowledge Computing group at Microsoft Research Asia (MSRA).
irfnrdh/Awesome-Indonesia-NLP
Resource NLP & Bahasa
WorksApplications/SudachiDict
A lexicon for Sudachi
oroszgy/awesome-hungarian-nlp
A curated list of NLP resources for Hungarian
iPieter/RobBERT
A Dutch RoBERTa-based language model
INK-USC/TriggerNER
TriggerNER: Learning with Entity Triggers as Explanations for Named Entity Recognition (ACL 2020)
gopala-kr/summary
summaries of all the papers I read
guhhhhaa/wula-scifi
chinese NLP corpus of chinese science fiction, chinese science fiction corpus: Archive of the Ark Plan of Ula Science Fiction Website 乌拉科幻小说网方舟计划存档,中文科幻小说自然语言处理语料库,中文科幻小说文本语料库,中文科幻小说文本数据库,科幻小说语料
mikeroyal/NLP-Guide
Natural Language Processing (NLP). Covering topics such as Tokenization, Part Of Speech tagging (POS), Machine translation, Named Entity Recognition (NER), Classification, and Sentiment analysis.
CreativeCodingLab/TextAnnotationGraphs
A modular annotation system that supports complex, interactive annotation graphs embedded on top of sequences of text.
uma-pi1/minie
An open information extraction system that provides compact extractions
linuxscout/arabicnlptoolslist
Arabic NLP tools List inventory
laymonage/kbbi-python
A Python module that fetches a page of a word/phrase from the Online Indonesian Dictionary (https://kbbi.kemdikbud.go.id).
EticaAI/linguistic-datasets-portuguese
Linguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
anoopkunchukuttan/indic_nlp_resources
Resources to go with the Indic NLP Library
StatguyUser/TextFeatureSelection
Python library for feature selection for text features. It has filter method, genetic algorithm and TextFeatureSelectionEnsemble for improving text classification models. Helps improve your machine learning models
AndyTheFactory/romanian-nlp-datasets
A list of Romanian NLP Datasets
uma-pi1/OPIEC
Reading the data from OPIEC - an Open Information Extraction corpus
MIT-LCP/bloatectomy
A python package for removing duplicate text in clinical notes or other documents
atakansite/nlp-courses
Natural Language Processing Courses with Resources
Nativeatom/NaturalLanguageProcessing
Natural Language Procesing
JudePark96/awesome-nlp-references
A curated list of resources dedicated to Knowledge Distillation, Recommendation System, especially Natural Language Processing (NLP).
Curated-Awesome-Lists/awesome-arabic-nlp
Dive into the world of Arabic NLP with this extensive collection of resources, tools, datasets, and best practices tailored for the Arabic language.
nguynking/CS224N
Assignment solutions for CS224N: Natural Language Processing with Deep Learning - Stanford / Winter 2023
erickrf/ppdb
Interface for reading the Paraphrase Database (PPDB)
strubell/preprocess-conll05
Scripts for preprocessing the CoNLL-2005 SRL dataset.