csebuetnlp

Bangladesh University of Engineering and TechnologyBangladesh

Pinned Repositories

banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
Language:Python245 8 833
BanglaNLG
This repository contains the official release of the model "BanglaT5" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla".
Language:Python84 5 210
banglanmt
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Language:Python149 9 1146
banglaparaphrase
This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.
Language:Python16 1 01
BanglaSocialBias
This is the official repository containing all codes used to generate the results reported in the paper titled "Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias"
Language:Jupyter Notebook6 1 02
CoDesc
A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.
Language:Python53 2 09
CrossSum
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.
Language:Python52 3 77
IllusionVQA
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
Language:Jupyter Notebook19 2 01
normalizer
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Language:Python35 4 17
xl-sum
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Language:Python275 6 1542

csebuetnlp's Repositories

csebuetnlp/xl-sum
This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021.
Language:Python275 6 1542
csebuetnlp/banglabert
This repository contains the official release of the model "BanglaBERT" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaBERT: Language Model Pretraining and Benchmarks for Low-Resource Language Understanding Evaluation in Bangla" accpeted in Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: NAACL-2022.
Language:Python245 8 833
csebuetnlp/banglanmt
This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), November 16 - November 20, 2020.
Language:Python149 9 1146
csebuetnlp/BanglaNLG
This repository contains the official release of the model "BanglaT5" and associated downstream finetuning code and datasets introduced in the paper titled "BanglaNLG: Benchmarks and Resources for Evaluating Low-Resource Natural Language Generation in Bangla".
Language:Python84 5 210
csebuetnlp/CoDesc
A large dataset of 4.2m Java source code and parallel data of their description from code search, and code summarization studies.
Language:Python53 2 09
csebuetnlp/CrossSum
This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.
Language:Python52 3 77
csebuetnlp/normalizer
This python module is an easy-to-use port of the text normalization used in the paper "Not low-resource anymore: Aligner ensembling, batch filtering, and new datasets for Bengali-English machine translation". It is intended to be used for normalizing / cleaning Bengali and English text.
Language:Python35 4 17
csebuetnlp/IllusionVQA
This repository contains the data and code of the paper titled "IllusionVQA: A Challenging Optical Illusion Dataset for Vision Language Models"
Language:Jupyter Notebook19 2 01
csebuetnlp/banglaparaphrase
This repository contains the code, data, and associated models of the paper titled "BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset", accepted in Proceedings of the Asia-Pacific Chapter of the Association for Computational Linguistics: AACL 2022.
Language:Python16 1 01
csebuetnlp/BanglaSocialBias
This is the official repository containing all codes used to generate the results reported in the paper titled "Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias"
Language:Jupyter Notebook6 1 02
csebuetnlp/BanglaEmotionBias
This is the official repository containing all codes used to generate the results reported in the paper titled "An Empirical Study of Gendered Stereotypes in Emotional Attributes for Bangla in Multilingual Large Language Models" accepted at the 5th Workshop on Gender Bias in Natural Language Processing (hosted at the ACL 2024 Conference)
Language:Jupyter Notebook5 1 01
csebuetnlp/BanglaContextualBias
This is the official repository containing all codes used to generate the results reported in the paper titled "An Empirical Study on the Characteristics of Bias upon Context Length Variation for Bangla" accepted in Findings of the Association for Computational Linguistics: ACL 2024
Language:Jupyter Notebook3 1 02
csebuetnlp/csebuetnlp.github.io
Language:CSS2 2 00
csebuetnlp/TransCoder
Public release of the TransCoder research project https://arxiv.org/pdf/2006.03511.pdf
Language:Python2 0 00

csebuetnlp

Pinned Repositories

banglabert

BanglaNLG

banglanmt

banglaparaphrase

BanglaSocialBias

CoDesc

CrossSum

IllusionVQA

normalizer

xl-sum

csebuetnlp's Repositories

csebuetnlp/xl-sum

csebuetnlp/banglabert

csebuetnlp/banglanmt

csebuetnlp/BanglaNLG

csebuetnlp/CoDesc

csebuetnlp/CrossSum

csebuetnlp/normalizer

csebuetnlp/IllusionVQA

csebuetnlp/banglaparaphrase

csebuetnlp/BanglaSocialBias

csebuetnlp/BanglaEmotionBias

csebuetnlp/BanglaContextualBias

csebuetnlp/csebuetnlp.github.io

csebuetnlp/TransCoder