nlp-dataset
There are 16 repositories under nlp-dataset topic.
amirshnll/Persian-Swear-Words
Persian Swear Dataset - you can use in your production to filter unwanted content. دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها
AndyTheFactory/romanian-nlp-datasets
A list of Romanian NLP Datasets
afrisenti-semeval/afrisent-semeval-2023
AfriSenti-SemEval Shared Task 12: Sentiment Analysis for African languages : https://afrisenti-semeval.github.io/
K-RLange/SpeakGer
A meta enriched data set of German parliamental debates covering 74 years of plenary protocols.
machinelearningZH/zix_understandability-index
Measure how understandable a German text is.
amazon-science/webie
Dataset for web-scaled information extraction.
semnan-university-ai/persian-slang
Persian Slang Words (dataset)
Koziev/Rifma
Dataset with annotation of Russian-language poems
semnan-university-ai/persian-sms-dataset
Persian sms dataset
deepinstinct-algo/DeepURLBench
This repo is the dataset for the paper "A New Dataset and Methodology for Malicious URL Classification"
Dia-Bete/PersonaBasedCorpus
Repository for the LREC-COLING 2024 Paper: Persona-Based Corpus in the Diabetes Mellitus Domain – Applying a Human-Centered Approach to a Low-Resource Context
semnan-university-ai/persian-news-dataset
Persian News Dataset
Koziev/Translations
Parallel Literary Corpora: Fiction and Poetry Translations
readerbench/news-ro-offense
a novel Romanian language dataset for offensive message detection with manually annotated comment from a local Romanian news website (stiri de cluj) into five classes
readerbench/ro-offense
RO-Offense: A Novel Romanian Dataset for Offensive Language in Online Comments