Pinned Repositories
.github
We are building a large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan
_rosetta
A large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan.
async-scraptt
A Python web scraper for extracting post content and comments from PTT website.
blacklab-demo
A repo that demonstrates how to build Blacklab corpus via Docker and Nginx.
ckip-2-tei
A Python package that asynchronously segments JSON data into TEI XML format.
corpus-frontend
A large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan.
mercury
scraptt
The most comprehensive PTT (踢踢踢) Crawler
sol
Taiwan Social Media Corpus 's Repositories
Taiwan-Social-Media-Corpus/scraptt
The most comprehensive PTT (踢踢踢) Crawler
Taiwan-Social-Media-Corpus/async-scraptt
A Python web scraper for extracting post content and comments from PTT website.
Taiwan-Social-Media-Corpus/.github
We are building a large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan
Taiwan-Social-Media-Corpus/_rosetta
A large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan.
Taiwan-Social-Media-Corpus/blacklab-demo
A repo that demonstrates how to build Blacklab corpus via Docker and Nginx.
Taiwan-Social-Media-Corpus/ckip-2-tei
A Python package that asynchronously segments JSON data into TEI XML format.
Taiwan-Social-Media-Corpus/corpus-frontend
A large-scaled, diverse and linguistically-enriched social media corpus of Mandarin in Taiwan.
Taiwan-Social-Media-Corpus/mercury
Taiwan-Social-Media-Corpus/sol