Exploring Cultural Variations in Moral Judgments with Large Language Models

Hadi Mohammadi1, Evi Papadopoulou1, Mijntje Meijer1, Ayoub Bagheri1
1Utrecht University, The Netherlands
Under review at Applied Artificial Intelligence Journal

Abstract

As Large Language Models (LLMs) are increasingly deployed across diverse global contexts, understanding their ability to represent and reason about cultural variations in moral judgments becomes crucial. This paper presents the first comprehensive study examining how state-of-the-art LLMs understand and generate moral judgments across different cultural contexts, using established frameworks from cross-cultural psychology and moral philosophy.

We develop a novel evaluation framework based on Hofstede's cultural dimensions and Haidt's Moral Foundations Theory to assess LLMs' cultural moral reasoning. Using carefully designed prompts representing moral dilemmas from 50+ cultures, we analyze how models like GPT-4, Claude, and multilingual variants respond to culturally-specific moral scenarios. Our findings reveal that while LLMs can recognize some cultural differences in moral judgments, they exhibit significant biases toward Western moral frameworks and struggle with non-WEIRD (Western, Educated, Industrialized, Rich, Democratic) cultural perspectives.

This work has important implications for the deployment of AI systems in culturally diverse settings, highlighting the need for more inclusive training data and evaluation methods. We propose concrete steps toward developing culturally-aware AI systems that can better serve global populations while respecting diverse moral frameworks.

Key Contributions

  • Cultural Evaluation Framework: We develop the first systematic framework for evaluating LLMs' understanding of cultural moral variations, combining insights from psychology, anthropology, and ethics.
  • Large-Scale Analysis: We analyze moral judgments across 50+ cultures using 10,000+ culturally-grounded moral scenarios, creating a valuable dataset for future research.
  • Bias Quantification: We quantify the extent of Western-centric bias in current LLMs and identify specific areas where cultural understanding fails.
  • Practical Guidelines: We provide actionable recommendations for developing more culturally-aware AI systems and mitigating cultural biases in deployment.

Methodology

Cultural Framework

Our approach integrates multiple theoretical frameworks:

  • Hofstede's Cultural Dimensions: Power distance, individualism/collectivism, uncertainty avoidance, masculinity/femininity, long-term orientation
  • Moral Foundations Theory: Care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, sanctity/degradation
  • World Values Survey: Traditional vs. secular-rational values, survival vs. self-expression values

Experimental Design

  1. Scenario Generation: Create culturally-specific moral dilemmas validated by cultural experts
  2. Model Evaluation: Test LLMs with scenarios using both direct prompting and role-playing approaches
  3. Response Analysis: Analyze moral judgments using automated metrics and human evaluation
  4. Cross-Cultural Comparison: Compare model responses across cultures to identify patterns and biases

Key Findings

1. Western-Centric Bias

LLMs show strong alignment with Western moral frameworks:

  • 87% of responses align with individualistic moral reasoning
  • Models struggle with collectivist moral perspectives (42% accuracy)
  • Significant underrepresentation of non-Western moral concepts

2. Cultural Dimension Performance

Cultural Dimension Recognition Accuracy Generation Quality
Individualism/Collectivism 73% Moderate
Power Distance 61% Low
Uncertainty Avoidance 68% Moderate

3. Model-Specific Insights

  • GPT-4: Best at recognizing cultural contexts but still Western-biased in judgments
  • Claude: Most consistent across cultures but limited in cultural depth
  • Multilingual Models: Better at non-English contexts but still struggle with cultural reasoning

Implications

Our findings have significant implications for AI deployment:

  1. Global AI Systems: Current LLMs may perpetuate Western moral hegemony when deployed globally
  2. Cultural Fairness: Need for culturally-diverse training data and evaluation metrics
  3. Ethical AI: Importance of incorporating diverse moral frameworks in AI alignment
  4. Practical Applications: Caution needed when using LLMs for culturally-sensitive applications

Citation

@article{mohammadi2024cultural,
  title={Exploring Cultural Variations in Moral Judgments with Large Language Models},
  author={Mohammadi, Hadi and Papadopoulou, Evi and Meijer, Mijntje and Bagheri, Ayoub},
  journal={arXiv preprint arXiv:2506.12433},
  year={2024}
}