A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction

Hadi Mohammadi1, Anastasia Giachanou1, Ayoub Bagheri1
1Utrecht University, The Netherlands
Applied Sciences Journal, Volume 14, Issue 19, 2024

Abstract

Sexism, a form of discrimination based on gender, is increasingly prevalent on social media platforms, where it often manifests as hate speech targeted at individuals or groups based on their gender. While machine learning models can detect such content, their "black box" nature obscures their decision-making processes, making it difficult for users to understand why certain posts are flagged as sexist.

This paper addresses the critical need for transparency in automated sexism detection by proposing an explainable pipeline that combines accurate classification with interpretable explanations. We demonstrate that incorporating explainability techniques like LIME and SHAP not only maintains high detection accuracy but also provides valuable insights into model behavior, revealing which words and phrases most strongly indicate sexist content.

Our comprehensive evaluation on the EXIST 2021 dataset shows that our transparent approach achieves an F1-score of 0.82 while providing clear, understandable explanations for each prediction. This dual focus on accuracy and interpretability makes our system particularly suitable for real-world deployment, where understanding the reasoning behind content moderation decisions is crucial for both platform operators and users.

Key Contributions

  • Transparent Pipeline: We develop a comprehensive pipeline that integrates multiple explainability techniques (LIME, SHAP, attention weights) with state-of-the-art classification models for sexism detection.
  • Multi-Model Evaluation: We evaluate various models including traditional ML (SVM, Random Forest) and deep learning approaches (BERT, RoBERTa) to identify the best balance between performance and explainability.
  • Linguistic Analysis: Through explainability techniques, we identify key linguistic patterns and markers that indicate sexist content, providing insights for both researchers and content moderators.
  • Practical Implementation: We provide a fully implemented system with code and guidelines for deployment, making our approach accessible to practitioners and researchers.

Methodology

Pipeline Architecture

Our transparent pipeline consists of four main components:

  1. Data Preprocessing: Text cleaning, normalization, and feature extraction tailored for social media content.
  2. Model Training: Training multiple classifiers with different architectures to compare performance and explainability trade-offs.
  3. Explainability Generation: Applying LIME for local explanations and SHAP for global feature importance analysis.
  4. Explanation Visualization: Creating intuitive visualizations that highlight important words and their contribution to the prediction.

Explainability Techniques

We employ three complementary explainability approaches:

  • LIME (Local Interpretable Model-agnostic Explanations): Provides instance-level explanations by approximating the model locally.
  • SHAP (SHapley Additive exPlanations): Offers both local and global explanations based on game theory principles.
  • Attention Visualization: For transformer-based models, we visualize attention weights to understand model focus.

Results

Our experiments on the EXIST 2021 dataset demonstrate that:

  • The RoBERTa-based model achieves the best performance with an F1-score of 0.82
  • Traditional ML models with SHAP explanations provide the most interpretable results while maintaining competitive accuracy (F1: 0.78)
  • Key indicators of sexist content include gender-specific slurs, stereotypical role assignments, and objectifying language
  • The explainability overhead is minimal, adding only 2-3 seconds per prediction for real-time applications

Our analysis reveals that combining multiple explainability techniques provides complementary insights, with LIME excelling at instance-level explanations and SHAP better for understanding overall model behavior.

Code and Resources

All code for this project is available on GitHub, including:

  • Complete implementation of the transparent pipeline
  • Pre-trained models for immediate use
  • Jupyter notebooks with examples and tutorials
  • Scripts for reproducing all experimental results

Visit our GitHub repository for more information.

Citation

@article{mohammadi2024transparent,
  title={A Transparent Pipeline for Identifying Sexism in Social Media: Combining Explainability with Model Prediction},
  author={Mohammadi, Hadi and Giachanou, Anastasia and Bagheri, Ayoub},
  journal={Applied Sciences},
  volume={14},
  number={19},
  pages={8620},
  year={2024},
  publisher={MDPI}
}