For most up-to-date list of my publications, please visit my Google Scholar profile.
preprint
conference & journal articles
2025
-
Algorithmic Behaviors Across Regions: A Geolocation Audit of YouTube Search for COVID-19 Misinformation between the United States and South Africa
Hayoung Jung, Prerna Juneja, and Tanushree Mitra
In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM)., 2025
Despite being an integral tool for finding health-related information online, YouTube has faced criticism for disseminating COVID-19 misinformation globally to its users. Yet, prior audit studies have predominantly investigated YouTube within the Global North contexts, often overlooking the Global South. To address this gap, we conducted a comprehensive 10-day geolocation-based audit on YouTube to compare the prevalence of COVID-19 misinformation in search results between the United States (US) and South Africa (SA), the countries heavily affected by the pandemic in the Global North and the Global South, respectively. For each country, we selected 3 geolocations and placed sock-puppets, or bots emulating "real" users, that collected search results for 48 search queries sorted by 4 search filters for 10 days, yielding a dataset of 915K results. We found that 31.55% of the top-10 search results contained COVID-19 misinformation. Among the top-10 search results, bots in SA faced significantly more misinformative search results than their US counterparts. Overall, our study highlights the contrasting algorithmic behaviors of YouTube search between two countries, underscoring the need for the platform to regulate algorithmic behavior consistently across different regions of the Globe.
-
Online Myths on Opioid Use Disorder: A Comparison of Reddit and Large Language Model
Shravika Mittal, Hayoung Jung, Mai ElSherief, Tanushree Mitra, and Munmun De Choudhury
In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM)., 2025
Online communities on Reddit are a popular choice among people with opioid use disorder (OUD) to seek information on drug use, withdrawal symptoms, and recovery. LLMpowered chatbots (e.g., ChatGPT) are widely being adopted as question-answer systems for health-related queries. However, such online health information seeking could potentially be hindered by myths and misinformation on OUD, misleading or causing genuine harm to people with OUD. In this work, we examine the prevalence of 5 OUD-related myths, on treatment models and patient characteristics, within human-(taken from Reddit) and LLM-generated responses to queries on OUD. We further explore the framing strategies used within responses (both human- and LLM-generated) promoting and countering the myths. We found that all 5 myths were more widespread within human-generated responses. In addition, myth-promoting responses adopted trustworthy and authoritative framings, compared to knowledge-imparting linguistic cues within those countering the myths. Our work offers recommendations to reduce online OUD misinformation.
2024
-
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions
Chan Young Park*, Shuyue Stella Li*, Hayoung Jung*, Svitlana Volkova, Tanu Mitra, David Jurgens, and Yulia Tsvetkov
In Findings of the Association for Computational Linguistics: EMNLP 2024, Nov 2024
This study introduces ValueScope, a framework leveraging language models to quantify social norms and values within online communities, grounded in social science perspectives on normative structures. We employ ValueScope to dissect and analyze linguistic and stylistic expressions across 13 Reddit communities categorized under gender, politics, science, and finance. Our analysis provides a quantitative foundation confirming that even closely related communities exhibit remarkably diverse norms. This diversity supports existing theories and adds a new dimension to understanding community interactions. ValueScope not only delineates differences in social norms but also effectively tracks their evolution and the influence of significant external events like the U.S. presidential elections and the emergence of new sub-communities. The framework thus highlights the pivotal role of social norms in shaping online interactions, presenting a substantial advance in both the theory and application of social norm studies in digital spaces.
-
“They are uncultured”: Unveiling Covert Harms and Social Threats in LLM Generated Conversations
Preetam Prabhu Srikar Dammu*, Hayoung Jung*, Anjali Singh, Monojit Choudhury, and Tanu Mitra
In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Nov 2024 – Nominated for the Best Paper Award
Large language models (LLMs) have emerged as an integral part of modern societies, powering user-facing applications such as personal assistants and enterprise applications like recruitment tools. Despite their utility, research indicates that LLMs perpetuate systemic biases. Yet, prior works on LLM harms predominantly focus on Western concepts like race and gender, often overlooking cultural concepts from other parts of the world. Additionally, these studies typically investigate “harm” as a singular dimension, ignoring the various and subtle forms in which harms manifest. To address this gap, we introduce the Covert Harms and Social Threats (CHAST), a set of seven metrics grounded in social science literature. We utilize evaluation models aligned with human assessments to examine the presence of covert harms in LLM-generated conversations, particularly in the context of recruitment. Our experiments reveal that seven out of the eight LLMs included in this study generated conversations riddled with CHAST, characterized by malign views expressed in seemingly neutral language unlikely to be detected by existing methods. Notably, these LLMs manifested more extreme views and opinions when dealing with non-Western concepts like caste, compared to Western ones such as race.