Pinned Repositories
TopicTuner
HDBSCAN Tuning for BERTopic Models
ToxiCN
The code and resource of "Facilitating Fine-grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmark" (ACL2023).
MediaCrawler
小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频 | 评论爬虫、微博帖子 | 评论爬虫、百度贴吧帖子 | 百度贴吧评论回复爬虫 | 知乎问答文章|评论爬虫
TechGPT
TechGPT: Technology-Oriented Generative Pretrained Transformer
WeiboSpider
持续维护的新浪微博采集工具🚀🚀🚀
hate-speech-dataset
Hate speech dataset from Stormfront forum manually labelled at sentence level.
Child
ECHR-judgment-classifiers