Abstract: Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications. Students’ feedback is a vital resource for the interdisciplinary research involving the combining of two different research fields between sentiment analysis and education. Vietnamese Students’ Feedback Corpus (UIT-VSFC) is the resource consists of over 16,000 sentences which are human-annotated with two different tasks: sentiment-based and topic-based classifications. To assess the quality of our corpus, we measure the annotator agreements and classification evaluation on the UIT-VSFC corpus. As a result, we obtained the inter-annotator agreement of sentiments and topics with more than over 91% and 71% respectively. In addition, we built the baseline model with the Maximum Entropy classifier and achived approximately 88% of the sentiment F1-score and over 84% of the topic F1-score. Our dataset is available here: http://nlp.uit.edu.vn/datasets/
If you use this dataset, please cite this paper: Kiet Van Nguyen, Vu Duc Nguyen, Phu Xuan-Vinh Nguyen, Tham Thi-Hong Truong, Ngan Luu-Thuy Nguyen, UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis, 2018 10th International Conference on Knowledge and Systems Engineering (KSE 2018), November 1-3, 2018, Ho Chi Minh City, Vietnam.
[1] Thien Khai Tran. Phân tích cảm xúc trên cơ sở trị cảm xúc chuyển dịch theo ngữ cảnh cho tiếng Việt (Sentiment analysis based on emotion-value transfer for Vietnamese contexts). PhD Thesis.
[1] Nguyen, P.X., Hong, T.T., Van Nguyen, K. and Nguyen, N.L.T., 2018, November. Deep learning versus traditional classifiers on Vietnamese students’ feedback corpus. In 2018 5th NAFOSTED conference on information and computer science (NICS) (pp. 75-80). IEEE.
[2] Ho, V.A., Nguyen, D.H.C., Nguyen, D.H., Pham, L.T.V., Nguyen, D.V., Nguyen, K.V. and Nguyen, N.L.T., 2019, October. Emotion recognition for vietnamese social media text. In International Conference of the Pacific Association for Computational Linguistics (pp. 319-333). Springer, Singapore.
[3] Nguyen, V.D., Van Nguyen, K. and Nguyen, N.L.T., 2018, November. Variants of long short-term memory for sentiment analysis on vietnamese students’ feedback corpus. In 2018 10th international conference on knowledge and systems engineering (KSE) (pp. 306-311). IEEE.
[4] Kastrati, Z., Dalipi, F., Imran, A.S., Pireva Nuci, K. and Wani, M.A., 2021. Sentiment analysis of students’ feedback with NLP and deep learning: A systematic mapping study. Applied Sciences, 11(9), p.3986.
[5] Nguyen, L.T., Nguyen, K.V. and Nguyen, N.L.T., 2021, July. Constructive and toxic speech detection for open-domain social media comments in vietnamese. In International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems (pp. 572-583). Springer, Cham.
[6] Nguyen, M.H., Nguyen, T.M., Van Thin, D. and Nguyen, N.L.T., 2019, October. A corpus for aspect-based sentiment analysis in Vietnamese. In 2019 11th International Conference on Knowledge and Systems Engineering (KSE) (pp. 1-5). IEEE.
[7] Tran, T.K. and Phan, T.T., 2020. Capturing contextual factors in sentiment classification: an ensemble approach. IEEE Access, 8, pp.116856-116865.
[8] Singh, R. and Singh, R., 2021. Applications of sentiment analysis and machine learning techniques in disease outbreak prediction–A review. Materials Today: Proceedings.
[9] Phan, L.L., Pham, P.H., Nguyen, K.T.T., Nguyen, T.T., Huynh, S.K., Nguyen, L.T., Van Huynh, T. and Van Nguyen, K., 2021. Sa2sl: From aspect-based sentiment analysis to social listening system for business intelligence. arXiv preprint arXiv:2105.15079.
[10] Huynh, H.D., Do, H.T.T., Van Nguyen, K. and Nguyen, N.L.T., 2020. A simple and efficient ensemble classifier combining multiple neural network models on social media datasets in vietnamese. arXiv preprint arXiv:2009.13060.
[11] Luu, S.T., Van Nguyen, K. and Nguyen, N.L.T., 2020. Empirical study of text augmentation on social media text in vietnamese. arXiv preprint arXiv:2009.12319.
[12] Yang, R., 2021. Machine Learning and Deep Learning for Sentiment Analysis over Students' Reviews: An Overview Study.
[13] Nguyen, Q.H., Vu, L. and Nguyen, Q.U., 2020. A TWO-CHANNEL MODEL FOR REPRESENTATION LEARNING IN VIETNAMESE SENTIMENT CLASSIFICATION PROBLEM. Journal of Computer Science and Cybernetics, 36(4), pp.305-323.
[14] Phat, H.N. and Anh, N.T.M., 2020. Vietnamese text classification algorithm using long short term memory and Word2Vec. Информатика и автоматизация, 19(6), pp.1255-1279.
[15] Le, L.S., Thin, D.V., Nguyen, N.L.T. and Trinh, S.Q., 2020, November. A Multi-filter BiLSTM-CNN Architecture for Vietnamese Sentiment Analysis. In International Conference on Computational Collective Intelligence (pp. 752-763). Springer, Cham.
[16] Sirajudeen, S., Duraisamy, B. and Ajantha Devi, V., 2021. Sentiment Analysis to Assess Students’ Perception on the Adoption of Online Learning During Pre-COVID-19 Pandemic Period. In Intelligent Computing and Innovation on Data Science (pp. 157-166). Springer, Singapore.
[17] Nguyen, N.T.H., Ha, P.P.D., Nguyen, L.T., Van Nguyen, K. and Nguyen, N.L.T., 2021. Vietnamese Complaint Detection on E-Commerce Websites. arXiv preprint arXiv:2104.11969.
[18] Yang, R. and Edalati, M., 2021. Using GAN-based models to sentimental analysis on imbalanced datasets in education domain. arXiv preprint arXiv:2108.12061.
[19] Nguyen, H.Q., Vu, L. and Nguyen, Q.U., 2020. RESIDUAL ATTENTION BI-DIRECTIONAL LONG SHORT-TERM MEMORY FOR VIETNAMESE SENTIMENT CLASSIFICATION. Journal of Science and Technique-Section on Information and Communication Technology, 9(02).
[20] Nguyen, H.Q., Vu, L. and Nguyen, Q.U., 2021, August. A study of word presentation in vietnamese sentiment analysis. In 2021 International Conference on System Science and Engineering (ICSSE) (pp. 476-481). IEEE.
[21] Le, A.P., Pham, T.V., Le, T.V. and Huynh, D.V., 2021, December. Neural Transfer Learning For Vietnamese Sentiment Analysis Using Pre-trained Contextual Language Models. In 2021 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT) (pp. 1-5). IEEE.
[22] Xuân, Q.D.V., Laosen, K. and Laosen, N., 2021, November. An Evaluation of the UIT-VSFC Dataset Using Modern Machine Learning Techniques and Word Embeddings. In 2021 25th International Computer Science and Engineering Conference (ICSEC) (pp. 394-399). IEEE.
[23] Truong, T.L., Le, H.L. and Le-Dang, T.P., 2020, November. Sentiment Analysis Implementing BERT-based Pre-trained Language Model for Vietnamese. In 2020 7th NAFOSTED Conference on Information and Computer Science (NICS) (pp. 362-367). IEEE.
[24] Palit, S., Nur, S., Khatun, Z., Rahman, M. and Ahmed, M.T., 2021, May. Analysis of Online Education System of Bangladesh during COVID-19 Pandemic Based on NLP and Machine Learning: Problem and Prospect. In 2021 Emerging Trends in Industry 4.0 (ETI 4.0) (pp. 1-6). IEEE.
[25] Roy, P., 2021. Collation of Feasible Solutions for Domain Based Problems: An Analysis of Sentiments Based on Codeathon Activity. arXiv preprint arXiv:2108.10034.
[26] Pinargote-Ortega, M., Bowen-Mendoza, L., Meza, J. and Ventura, S., 2021. Peer assessment using soft computing techniques. Journal of Computing in Higher Education, 33(3), pp.684-726.
[27] Sáu, T.N.T., Sang, Đ.P. and Trang, P.T.T., 2021. PHÂN TÍCH Ý KIẾN THEO KHÍA CẠNH TRÊN BÌNH LUẬN PHẢN HỒI CỦA SINH VIÊN CHO TIẾNG VIỆT. TNU Journal of Science and Technology, 226(18), pp.48-55.
[28] Cruz, E., González, M. and Rangel, J.C., 2022. Técnicas de machine learning aplicadas a la evaluación del rendimiento ya la predicción de la deserción de estudiantes universitarios, una revisión. Prisma Tecnológico, 13(1), pp.77-87.