Pinned Repositories
crawl-naver-news-and-comments
Crawling the most read news articles per day over the years (with comments)
crawl-reuters
A simple Scrapy script for crawling Reuters news articles (Python 3)
dots_public
PDTB-discourse-relation-classifier
sentsplit
A flexible sentence segmentation library using CRF model and regex rules
streamlit-tutorial
A simple tutorial script on Streamlit using the Iris Dataset
Visualizing-Cross-Lingual-Discourse-Relations
Codes for paper, "Visualizing Cross-Lingual Discourse Relations in Multilingual TED Corpora" at CODI 2021 @ EMNLP 2021
wikiextractor
A tool for extracting plain text from Wikipedia dumps
zaemyung's Repositories
zaemyung/sentsplit
A flexible sentence segmentation library using CRF model and regex rules
zaemyung/streamlit-tutorial
A simple tutorial script on Streamlit using the Iris Dataset
zaemyung/Visualizing-Cross-Lingual-Discourse-Relations
Codes for paper, "Visualizing Cross-Lingual Discourse Relations in Multilingual TED Corpora" at CODI 2021 @ EMNLP 2021
zaemyung/crawl-naver-news-and-comments
Crawling the most read news articles per day over the years (with comments)
zaemyung/dots_public
zaemyung/PDTB-discourse-relation-classifier
zaemyung/bertviz
Tool for visualizing attention in the Transformer model (BERT, GPT-2, Albert, XLNet, RoBERTa, CTRL, etc.)
zaemyung/ContraPro
Contrastive evaluation of pronoun translation in neural machine translation
zaemyung/Cornell-Conversational-Analysis-Toolkit
ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
zaemyung/Creative-Commons-Markdown
Markdown-formatted Creative Commons licenses
zaemyung/disaster_tweets
zaemyung/Discourse-Phenomena-in-Document-level-Neural-Machine-Translation
Datasets for "A Test Suite for Evaluating Discourse Phenomena in Document-level Neural Machine Translation" accepted by Proceedings of the Second International Workshop of Discourse Processing
zaemyung/DMRST_Parser
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".
zaemyung/dockerfiles
zaemyung/good-translation-wrong-in-context
This is a repository with the data and code for the ACL 2019 paper "When a Good Translation is Wrong in Context: ..." and the EMNLP 2019 paper "Context-Aware Monolingual Repair for Neural Machine Translation"
zaemyung/google-research
Google Research
zaemyung/kmeans_pytorch
kmeans using PyTorch
zaemyung/korean_wordlist
korean wordlist
zaemyung/language-programmes
zaemyung/Large-contrastive-pronoun-testset-EN-FR
zaemyung/mtdlc
Library for parsing document-level corpora for machine translation
zaemyung/Pytorch-Sequence-Bucket-Iterator
A minimal sampler example for bucketing sequences of similar lengths in Pytorch based off of @TrentBrick script https://gist.github.com/TrentBrick/bac21af244e7c772dc8651ab9c58328c.
zaemyung/Shallow-Discourse-Annotation-for-Chinese-TED-Talks
Datasets for "Shallow Discourse Annotation for Chinese TED Talks" Accepted by LREC 2020
zaemyung/st-annotated-text
A simple component to display annotated text in Streamlit apps.
zaemyung/Ted-MDB-Annotations
zaemyung/transformer-lm
Transformer language model (GPT-2) with sentencepiece tokenizer
zaemyung/transformers
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.
zaemyung/utils
simple scripts that make life easier...
zaemyung/weightedWWL
learning subtree pattern importance for WL based graph kernels
zaemyung/zaemyung.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics