Biased AI Outputs Can Impact Humans’ Implicit Bias: A Case Study of the Impact of Gender-Biased Text-to-Image Generators
Mattea Sim, Natalie Grace Brigham, Tadayoshi Kohno, and 2 more authors
AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), Madrid, Oct 2025
A wave of recent work demonstrates that text-to-image generators (i.e., t2i) can perpetuate and amplify stereotypes about social groups. This research asks: what are the implications of biased t2i for humans who interact with these systems? Across three human-subjects studies, 1,881 participants engaged in a simulated t2i interaction in which the output was controlled to appear either stereotypic, gender-balanced, or counter-stereotypic, via the ratio of perceived women and men in the output of occupation prompts (e.g., a physicist). We then measured people’s implicit gender bias using a gender-brilliance implicit association task (IAT), a bias that both relates to stereotypic occupation output in t2i and that has implications for women’s representation in different fields. Participants who interacted with neutral t2i output (including only gender-neutral objects, e.g., DVDs) showed relatively high implicit gender-brilliance bias at baseline. Stereotypic t2i output did not increase implicit gender bias relative to this baseline (Study 1). However, participants exposed to counter-stereotypic t2i output had significantly lower implicit gender bias than participants exposed to only gender-neutral output (Studies 1 and 2). Although counter-stereotypic t2i may reduce implicit gender bias amongst users, less than 5% of participants actually preferred the counter-stereotypic representations of women and men. Instead, most participants preferred representations that accurately reflect gender distributions in society or that are more gender-balanced (Study 3). This work demonstrates a novel approach to studying human-AI interaction and reveals important insights for designing generative AI that seeks to mitigate harm. In particular, these findings have implications for understanding the impact of stereotypic t2i on human users, bias mitigation strategies via counter-stereotypic t2i output, and how these impacts (mis)align with people’s preferences for t2i representations.