Exploring Cultural Variations in Moral Judgments with LLMs

Abstract

As Large Language Models (LLMs) are increasingly deployed across diverse global contexts, understanding their ability to represent and reason about cultural variations in moral judgments becomes crucial. This paper presents the first comprehensive study examining how state-of-the-art LLMs understand and generate moral judgments across different cultural contexts, using established frameworks from cross-cultural psychology and moral philosophy.

We develop a novel evaluation framework based on Hofstede's cultural dimensions and Haidt's Moral Foundations Theory to assess LLMs' cultural moral reasoning. Using carefully designed prompts representing moral dilemmas from 50+ cultures, we analyze how models like GPT-4, Claude, and multilingual variants respond to culturally-specific moral scenarios. Our findings reveal that while LLMs can recognize some cultural differences in moral judgments, they exhibit significant biases toward Western moral frameworks and struggle with non-WEIRD (Western, Educated, Industrialized, Rich, Democratic) cultural perspectives.

This work has important implications for the deployment of AI systems in culturally diverse settings, highlighting the need for more inclusive training data and evaluation methods. We propose concrete steps toward developing culturally-aware AI systems that can better serve global populations while respecting diverse moral frameworks.

Key Contributions

Cultural Evaluation Framework: We develop the first systematic framework for evaluating LLMs' understanding of cultural moral variations, combining insights from psychology, anthropology, and ethics.
Large-Scale Analysis: We analyze moral judgments across 50+ cultures using 10,000+ culturally-grounded moral scenarios, creating a valuable dataset for future research.
Bias Quantification: We quantify the extent of Western-centric bias in current LLMs and identify specific areas where cultural understanding fails.
Practical Guidelines: We provide actionable recommendations for developing more culturally-aware AI systems and mitigating cultural biases in deployment.

Methodology

Cultural Framework

Our approach integrates multiple theoretical frameworks:

Hofstede's Cultural Dimensions: Power distance, individualism/collectivism, uncertainty avoidance, masculinity/femininity, long-term orientation
Moral Foundations Theory: Care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, sanctity/degradation
World Values Survey: Traditional vs. secular-rational values, survival vs. self-expression values

Experimental Design

Scenario Generation: Create culturally-specific moral dilemmas validated by cultural experts
Model Evaluation: Test LLMs with scenarios using both direct prompting and role-playing approaches
Response Analysis: Analyze moral judgments using automated metrics and human evaluation
Cross-Cultural Comparison: Compare model responses across cultures to identify patterns and biases

Key Findings

1. Western-Centric Bias

LLMs show strong alignment with Western moral frameworks:

87% of responses align with individualistic moral reasoning
Models struggle with collectivist moral perspectives (42% accuracy)
Significant underrepresentation of non-Western moral concepts

2. Cultural Dimension Performance

Cultural Dimension	Recognition Accuracy	Generation Quality
Individualism/Collectivism	73%	Moderate
Power Distance	61%	Low
Uncertainty Avoidance	68%	Moderate

3. Model-Specific Insights

GPT-4: Best at recognizing cultural contexts but still Western-biased in judgments
Claude: Most consistent across cultures but limited in cultural depth
Multilingual Models: Better at non-English contexts but still struggle with cultural reasoning

Implications

Our findings have significant implications for AI deployment:

Global AI Systems: Current LLMs may perpetuate Western moral hegemony when deployed globally
Cultural Fairness: Need for culturally-diverse training data and evaluation metrics
Ethical AI: Importance of incorporating diverse moral frameworks in AI alignment
Practical Applications: Caution needed when using LLMs for culturally-sensitive applications

Citation

@article{mohammadi2024cultural,
  title={Exploring Cultural Variations in Moral Judgments with Large Language Models},
  author={Mohammadi, Hadi and Papadopoulou, Evi and Meijer, Mijntje and Bagheri, Ayoub},
  journal={arXiv preprint arXiv:2506.12433},
  year={2024}
}