Awesome Cultural NLP:

A curated list of awesome cultural NLP resources, inspired by awesome-computer-vision.

Table Of Contents

Survey
Dataset
Image Captioning
Models
- Vision and Language
Evaluation
- LLMs
- Text-to-image
- VLMs
Analysis
Methodology
- Data
Alignment
- Model
- Data
Applications

Survey

Title	Conference / Journal	Paper	Code	Remarks
Culturally Aware and Adapted NLP: A Taxonomy and a Survey of the State of the Art	Arxiv 2024	2406.03930
Towards Measuring and Modeling “Culture” in LLMs: A Survey	Arxiv 2024	2403.15412	Github	Cool paper!
Challenges and Strategies in Cross-Cultural NLP	ACL 2022	2203.10020

Dataset

Title	Conference / Journal	Paper	Code	Remarks
CultureBank: An Online Community-Driven Knowledge Base Towards Culturally Aware Language Technologies	Arxiv 2024	2404.15238
NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models	Arxiv 2024	2404.12464	Data	Data
An image speaks a thousand words, but can everyone listen? On image transcreation for cultural relevance	Arxiv 2024	2404.01247	Code and Data	Data + Application
No Culture Left Behind: Massively Multi-Cultural Knowledge Acquisition & LM Benchmarking on 1000+ Sub-Country Regions and 2000+ Ethnolinguistic Groups	Arxiv 2024	2402.09369v1	Data
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models	Arxiv 2024 (under review)	2404.16019	Repository	Code and Data
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis	NAACL 2024	2308.16705	Data+Code
CLIcK: A Benchmark Dataset of Cultural and Linguistic Intelligence	LREC-COLING '24	https://arxiv.org/pdf/2403.06412	Data
Bridging Cultural Nuances in Dialogue Agents through Cultural Value Surveys	EACL Findings 2024	2401.10352	Dataset
Culturally Aware Natural Language Inference	EMNLP 2023 (Findings)	2023.findings-emnlp.509	Data
Global Voices, Local Biases: Socio-Cultural Prejudices across Languages	EMNLP 2023	2310.17586	Data	Data+Analysis
NORMSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly	EMNLP 2023	2210.08604	Code and Data	NormsKB
GeoDE: a Geographically Diverse Evaluation Dataset for Object Recognition	Neurips 2023	2301.02560	Code and Data
SeeGULL: A Stereotype Benchmark with Broad Geo-Cultural Coverage Leveraging Generative Models	ACL 2023	2305.11840	Code
FORK: A Bite-Sized Test Set for Probing Culinary Cultural Biases in Commonsense Reasoning Models	ACL Findings 2023	2023.findings-acl.631	Dataset
Multi-lingual and Multi-cultural Figurative Language Understanding	ACL Findings 2023	2305.16171	Code
EnCBP: A New Benchmark Dataset for Finer-Grained Cultural Background Prediction in English	ACL Findings 2022	2203.14498
Re-contextualizing Fairness in NLP: The Case of India	AACL 2022	2209.12226	Data	Data+Analysis
Visually Grounded Reasoning across Languages and Cultures	EMNLP 2021	2109.13238	Website	EMNLP 2021 Best Paper
Would you Rather? A New Benchmark for Learning Machine Alignment with Cultural Values and Social Preferences	ACL 2020	2020.acl-main.477/

Image Captioning

Title	Conference / Journal	Paper	Code	Remarks
CIC: A framework for Culturally-aware Image Captioning	IJCAI 2024	2402.05374	Webpage

Models

Vision and Language

Title	Conference / Journal	Paper	Code	Remarks
GIVL: Improving Geographical Inclusivity of Vision-Language Models With Pre-Training Methods	CVPR 2023	2301.01893	Code (not released yet)

Evaluation

LLMs

Title	Conference / Journal	Paper	Code
Cultural Conditioning or Placebo? On the Effectiveness of Socio-Demographic Prompting	Arxiv 2024	2406.11661
Extrinsic Evaluation of Cultural Competence in Large Language Models	Arxiv 2024	2406.11565
CulturalTeaming: AI-Assisted Interactive Red-Teaming for Challenging LLMs’ (Lack of) Multicultural Knowledge	Arxiv 2024	2404.06664
Having Beer after Prayer? Measuring Cultural Bias in Large Language Models	ACL 2024	2305.14456	Code

Text-to-image

Title	Conference / Journal	Paper	Code
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention	Arxiv 2024	2407.00377v1
On the Cultural Gap in Text-to-Image Generation	Arxiv 2023	2307.02971	Code

VLMs

Title	Conference / Journal	Paper	Code	Remarks
From Local Concepts to Universals: Evaluating the Multicultural Understanding of Vision-Language Models	Arxiv 2024	2407.00263

Analysis

Text-to-image

Title	Conference / Journal	Paper	Code
ViSAGe: A Global-Scale Analysis of Visual Stereotypes in Text-to-Image Generation	ACL 2024	2401.06310
DIG In: Evaluating Disparities in Image Generations with Indicators for Geographic Diversity	ICLR 2024	2308.06198	Code
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis	JAIR 2023	2209.08891	Code
Navigating Cultural Chasms: Exploring and Unlocking the Cultural POV of Text-To-Image Models	Arxiv 2023	2310.01929	Code (not released yet)
Inspecting the Geographical Representativeness of Images from Text-to-Image Models	ICCV 2023	2305.11080
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale	FAccT '23	2211.03759
Multilingual Conceptual Coverage in Text-to-Image Models	ACL 2023	2306.01735	Code

LLMs

Title	Conference / Journal	Paper	Code
Exploring Changes in Nation Perception with Nationality-Assigned
Personas in LLMs	Arxiv 2024	2406.13993
CULTURE-GEN: Revealing Global Cultural Perception in Language Models through Natural Language Prompting	Arxiv 2024	2404.10199v1	Code
Knowledge of cultural moral norms in large language models	ACL 2023	2306.01857
Multilingual Language Models are not Multicultural: A Case Study in Emotion	WASSA: ACL 2023	2307.01370
Social Commonsense for Explanation and Cultural Bias Discovery
DOSA: A Dataset of Social Artifacts from Different Indian Geographical Subcultures	LREC-COLING 2024	2403.14651	Code

VLMs

Title	Conference / Journal	Paper
Multilingual Diversity Improves Vision-Language Representations	Arxiv 2024	2405.16915
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision–Language Models	Arxiv 2024	2405.13777
Computer Vision Datasets and Models Exhibit Cultural and Linguistic Diversity in Perception	Arxiv 2024	2310.14356
Exploring Visual Culture Awareness in GPT-4V: A Comprehensive Probing	arxiv 2024	2402.06015
‘Person’ == Light-skinned, Western Man, and Sexualization of Women of Color: Stereotypes in Stable Diffusion	EMNLP 2023 Findings	2310.19981

Cross-cultural Variations

Title	Conference / Journal	Paper
Cross-Cultural Analysis of Human Values, Morals, and Biases in Folk Tales	EMNLP 2023	2023.emnlp-main.311
Social Commonsense for Explanation and Cultural Bias Discovery	EACL 2023	2023.eacl-main.271.pdf
Cross-cultural variation of speech-accompanying gesture: A review	Language and Cognitive Processes: Volume 24, Issue 2, 2009	10.1080/01690960802586188

Alignment

Models

Title	Conference / Journal	Paper	Remarks
Investigating Cultural Alignment of Large Language Models	Arxiv 2024	2402.13231
Unintended Impacts of LLM Alignment on Global Representation	Arxiv 2024	2402.15018
Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study	C3NLP: EACL 2023	2303.17466	Analysis
Probing Pre-Trained Language Models for Cross-Cultural Differences in Values	C3NLP: EACL 2023	2203.13722	Analysis

Data

Title	Conference / Journal	Paper	Code	Remarks
NLPositionality: Characterizing Design Biases of Datasets and Models	ACL 2023 (Outstanding Paper)	2023.acl-long.505.pdf	Website

Methodology

Data

Title	Conference / Journal	Paper	Code	Remarks
Cultural Concept Adaptation on Multimodal Reasoning	EMNLP 2023	EMNLP Main 18

Applications

Title	Conference / Journal	Paper	Code	Remarks
Cross-Cultural Similarity Features for Cross-Lingual Transfer Learning of Pragmatically Motivated Tasks	EACL 2021	2006.09336		Sentiment Analysis

Contributing

Please feel free to send me pull requests or email (khanuja.simran7@gmail.com) to add links.

Licenses

License

To the extent possible under law, Simran Khanuja has waived all copyright and related or neighboring rights to this work.

fabiopernisi/awesome-cultural-nlp

Awesome Cultural NLP:

Survey

Dataset

Image Captioning

Models

Vision and Language

Evaluation

LLMs

Text-to-image

VLMs

Analysis

Text-to-image

LLMs

VLMs

Cross-cultural Variations

Alignment

Models

Data

Methodology

Data

Applications

Contributing

Licenses