Understanding Social Biases in Large Language Models

Ojasvi Gupta, Stefano Marrone, Francesco Gargiulo, Rajesh Jaiswal, Lidia Marassi

Research output: Contribution to journalArticlepeer-review

Abstract

Background/Objectives: Large Language Models (LLMs) like ChatGPT, LLAMA, and Mistral are widely used for automating tasks such as content creation and data analysis. However, due to their training on publicly available internet data, they may inherit social biases. We aimed to investigate the social biases (i.e., ethnic, gender, and disability biases) in these models and evaluate how different model versions handle them. Methods: We instruction-tuned popular models (like Mistral, LLAMA, and Gemma), and for this we curated a dataset constructed by collecting and modifying diverse data from various public datasets. Prompts were run through a controlled pipeline, and responses were categorized (e.g., biased, confused, repeated, or accurate) and analyzed. Results: We found that models responded differently to bias prompts depending on their version. Fine-tuned models showed fewer overt biases but more confusion or censorship. Disability-related prompts triggered the most consistent biases across models. Conclusions: Bias persists in LLMs despite instruction tuning. Differences between model versions may lead to inconsistent user experiences and hidden harms in downstream applications. Greater transparency and robust fairness testing are essential.

Original languageEnglish
Article number106
JournalAI (Switzerland)
Volume6
Issue number5
DOIs
Publication statusPublished - May 2025

Keywords

  • algorithmic harms
  • artificial intelligence
  • ethical challenges
  • evaluation benchmarks
  • fairness
  • large language models

Fingerprint

Dive into the research topics of 'Understanding Social Biases in Large Language Models'. Together they form a unique fingerprint.

Cite this