kietnv

Natural Language Processing, Computational Linguistics, Parsing, Sentiment Analysis, Machine Reading Comprehension, and Question Answering

University of Information TechnologyHo Chi Minh City

Pinned Repositories

COVIDROP
Vi-COVIDQA is a numerical reasoning based machine reading comprehension dataset in Vietnamese
0 0 00
Datasets-for-Sentiment-Analysis
Benchmark datasets for sentiment analysis
2 0 00
MRC-tool
0 1 00
NLP-Vietnamese-progress
Repository to track the progress in Vietnamese Natural Language Processing, including the datasets and the current state-of-the-art for the most common Vietnamese NLP tasks.
2 0 01
UIT-ViSD4SA
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentment analysis dataset
0 0 01
uit-vsfc
Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications.
0 1 00
ViCCGbank
0 1 00
VietnameseDatasets
We provide benchmark datasets for evaluating Vietnamese processing models: UIT-ViQuAD, ViNewsQA, UIT-VSFC, UIT-ViIC, UIT-ViNames, UIT-VSMEC and ViMMRC.
18 2 01
ViHOS
Repository for the paper "ViHOS: Vietnamese Hate and Offensive Spans Detection" (EACL2023)
Language:Jupyter Notebook0 0 00
vireader
Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task. However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese. This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia. Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts. Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system. The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question. The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user. The ViReader system achieves new state-of-the-art performances, with values of 70.83% EM (exact match) and 89.54% F1, outperforming the BERT-based system by 11.55% and 9.54%, respectively. It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts). Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.65% for English and 6.13% for Chinese on the BiPaR dataset. Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.
Language:Jupyter Notebook10 1 11

kietnv's Repositories

kietnv/VietnameseDatasets
We provide benchmark datasets for evaluating Vietnamese processing models: UIT-ViQuAD, ViNewsQA, UIT-VSFC, UIT-ViIC, UIT-ViNames, UIT-VSMEC and ViMMRC.
18 2 01
kietnv/vireader
Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task. However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese. This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia. Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts. Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system. The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question. The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user. The ViReader system achieves new state-of-the-art performances, with values of 70.83% EM (exact match) and 89.54% F1, outperforming the BERT-based system by 11.55% and 9.54%, respectively. It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts). Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.65% for English and 6.13% for Chinese on the BiPaR dataset. Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.
Language:Jupyter Notebook10 1 11
kietnv/Datasets-for-Sentiment-Analysis
Benchmark datasets for sentiment analysis
2 0 00
kietnv/NLP-Vietnamese-progress
Repository to track the progress in Vietnamese Natural Language Processing, including the datasets and the current state-of-the-art for the most common Vietnamese NLP tasks.
2 0 01
kietnv/COVIDROP
Vi-COVIDQA is a numerical reasoning based machine reading comprehension dataset in Vietnamese
0 0 00
kietnv/MRC-tool
0 1 00
kietnv/UIT-ViSD4SA
ViSD4SA, a Vietnamese Span Detection for Aspect-based sentment analysis dataset
0 0 01
kietnv/uit-vsfc
Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications.
0 1 00
kietnv/ViCCGbank
0 1 00
kietnv/ViHOS
Repository for the paper "ViHOS: Vietnamese Hate and Offensive Spans Detection" (EACL2023)
Language:Jupyter Notebook0 0 00

kietnv

Pinned Repositories

COVIDROP

Datasets-for-Sentiment-Analysis

MRC-tool

NLP-Vietnamese-progress

UIT-ViSD4SA

uit-vsfc

ViCCGbank

VietnameseDatasets

ViHOS

vireader

kietnv's Repositories

kietnv/VietnameseDatasets

kietnv/vireader

kietnv/Datasets-for-Sentiment-Analysis

kietnv/NLP-Vietnamese-progress

kietnv/COVIDROP

kietnv/MRC-tool

kietnv/UIT-ViSD4SA

kietnv/uit-vsfc

kietnv/ViCCGbank

kietnv/ViHOS