The rapid advancement of Natural Language Processing (NLP) models has led to their widespread adoption across various domains. However, the black-box nature of these models raises concerns about their transparency and trustworthiness, particularly in high-stakes applications. This survey provides a comprehensive examination of the current state of explainable NLP across various domains, analyzing over 200 recent papers to understand how explainability is being implemented and evaluated in practice.
We categorize existing approaches based on their explainability methods, target audiences, and application domains. Our analysis reveals significant gaps between theoretical explainability frameworks and their practical implementations, particularly in domain-specific contexts. We identify key challenges including the lack of standardized evaluation metrics, the trade-off between model performance and interpretability, and the difficulty in generating explanations that are both faithful to the model and understandable to end-users.
This survey contributes to the field by: (1) providing a structured taxonomy of explainable NLP methods, (2) analyzing domain-specific requirements and implementations, (3) identifying current limitations and future research directions, and (4) proposing guidelines for developing more effective explainable NLP systems. Our findings emphasize the need for collaborative efforts between NLP researchers and domain experts to create truly useful and trustworthy AI systems.
We develop a structured taxonomy categorizing explainable NLP methods based on their approach (post-hoc vs. intrinsic), granularity (token, sentence, document level), and target audience.
We analyze how explainability requirements and implementations vary across different domains including healthcare, finance, legal, and social media analysis.
We propose a unified framework for evaluating explainable NLP systems, considering both technical metrics (faithfulness, consistency) and human-centered metrics.
Based on our analysis, we provide practical guidelines for researchers and practitioners developing explainable NLP systems in various domains.
Systematic search across major databases (ACL Anthology, IEEE Xplore, arXiv) for papers published between 2020-2024 on explainable NLP.
Applied strict inclusion criteria focusing on papers with practical implementations and empirical evaluations of explainability methods.
Developed a multi-dimensional taxonomy to categorize papers based on explainability approach, application domain, and evaluation methods.
Conducted in-depth analysis to identify patterns, gaps, and emerging trends in explainable NLP across different domains.
1. Theory-Practice Gap: While numerous explainability methods exist, their practical implementation often falls short of theoretical promises, particularly in real-world applications.
2. Domain-Specific Requirements: Different domains have vastly different explainability needs - what works for sentiment analysis may not be suitable for medical diagnosis.
3. Evaluation Challenges: There is no consensus on how to evaluate explainability, leading to inconsistent and incomparable results across studies.
4. User-Centric Design Gap: Most explainability methods are designed by and for ML researchers, often failing to meet the needs of actual end-users.
This survey has significant implications for the future development of explainable NLP systems. By providing a comprehensive overview of the current landscape, we enable researchers to:
Our work serves as a foundation for future research in explainable NLP, promoting the development of more transparent, trustworthy, and user-friendly AI systems across various application domains.
@article{mohammadi2025explainability, title={Explainability in Practice: A Survey of Explainable NLP Across Various Domains}, author={Mohammadi, Hadi and Bagheri, Ayoub and Giachanou, Anastasia and Oberski, Daniel L.}, journal={arXiv preprint arXiv:2502.00837}, year={2025} }