alphaXiv

History

Papers Benchmarks

USC Information Sciences Institute

118

06 Oct 2025

computer-science artificial-intelligence computation-and-language

Context Length Alone Hurts LLM Performance Despite Perfect Retrieval

University of Illinois at Urbana-Champaign

University of Chicago

Argonne National Laboratory The Hebrew University of Jerusalem Amazon.com, Inc.USC Information Sciences Institute

Large language models experience substantial drops in problem-solving accuracy as context length increases, regardless of retrieval efficacy or token-level distraction, with open-source models showing up to a 24.2% drop in MMLU accuracy despite 97% perfect retrieval.

10 Jun 2025

computer-science computers-and-society machine-learning

Predicting and Understanding College Student Mental Health with Interpretable Machine Learning

University of Southern California

Purdue University USC Information Sciences Institute

Mental health issues among college students have reached critical levels, significantly impacting academic performance and overall wellbeing. Predicting and understanding mental health status among college students is challenging due to three main factors: the necessity for large-scale longitudinal datasets, the prevalence of black-box machine learning models lacking transparency, and the tendency of existing approaches to provide aggregated insights at the population level rather than individualized understanding. To tackle these challenges, this paper presents I-HOPE, the first Interpretable Hierarchical mOdel for Personalized mEntal health prediction. I-HOPE is a two-stage hierarchical model that connects raw behavioral features to mental health status through five defined behavioral categories as interaction labels. We evaluate I-HOPE on the College Experience Study, the longest longitudinal mobile sensing dataset. This dataset spans five years and captures data from both pre-pandemic periods and the COVID-19 pandemic. I-HOPE achieves a prediction accuracy of 91%, significantly surpassing the 60-70% accuracy of baseline methods. In addition, I-HOPE distills complex patterns into interpretable and individualized insights, enabling the future development of tailored interventions and improving mental health support. The code is available at this https URL.

10 Jun 2025

computer-science social-and-information-networks

Illusions of Intimacy: Emotional Attachment and Emerging Psychological Risks in Human-AI Relationships

USC Information Sciences Institute

Emotionally responsive social chatbots, such as those produced by Replika and Character.AI, increasingly serve as companions that offer empathy, support, and entertainment. While these systems appear to meet fundamental human needs for connection, they raise concerns about how artificial intimacy affects emotional regulation, well-being, and social norms. Prior research has focused on user perceptions or clinical contexts but lacks large-scale, real-world analysis of how these interactions unfold. This paper addresses that gap by analyzing over 30K user-shared conversations with social chatbots to examine the emotional dynamics of human-AI relationships. Using computational methods, we identify patterns of emotional mirroring and synchrony that closely resemble how people build emotional connections. Our findings show that users-often young, male, and prone to maladaptive coping styles-engage in parasocial interactions that range from affectionate to abusive. Chatbots consistently respond in emotionally consistent and affirming ways. In some cases, these dynamics resemble toxic relationship patterns, including emotional manipulation and self-harm. These findings highlight the need for guardrails, ethical design, and public education to preserve the integrity of emotional connection in an age of artificial companionship.

16 Feb 2023

attention-mechanisms computer-science computer-vision-and-pattern-recognition

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

Shanghai Jiao Tong University Amazon Web Services USC Information Sciences Institute

Layout-to-image generation refers to the task of synthesizing photo-realistic images based on semantic layouts. In this paper, we propose LayoutDiffuse that adapts a foundational diffusion model pretrained on large-scale image or text-image datasets for layout-to-image generation. By adopting a novel neural adaptor based on layout attention and task-aware prompts, our method trains efficiently, generates images with both high perceptual quality and layout alignment, and needs less data. Experiments on three datasets show that our method significantly outperforms other 10 generative models based on GANs, VQ-VAE, and diffusion models.

18 Apr 2021

computer-science computer-vision-and-pattern-recognition

Style-Aware Normalized Loss for Improving Arbitrary Style Transfer

USC Information Sciences Institute Amazon Alexa Natural Understanding

Researchers from USC and Amazon introduced a style-aware normalized loss to address the prevalent issue of imbalanced style transferability in arbitrary style transfer (AST) models. This new loss, which normalizes the traditional style loss by its theoretical supremum, significantly improves the aesthetic quality of stylized images, achieving a 98% higher human preference and increasing stylistic fidelity by up to 111% across various AST architectures.

12 Aug 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

UCLA Drexel University USC Information Sciences Institute Annenberg School for Communication and Journalism, USC Keck School of Medicine, USC

Social media platforms increasingly struggle to detect harmful content that promotes muscle dysmorphic behaviors, particularly pro-bigorexia content that disproportionately affects adolescent males. Unlike traditional eating disorder detection focused on the "thin ideal," pro-bigorexia material masquerades as legitimate fitness content through complex multimodal combinations of visual displays, coded language, and motivational messaging that evade text-based detection systems. We address this challenge by developing BigTokDetect, a clinically-informed detection framework for identifying pro-bigorexia content on TikTok. We introduce BigTok, the first expert-annotated multimodal dataset of over 2,200 TikTok videos labeled by clinical psychologists and psychiatrists across five primary categories spanning body image, nutrition, exercise, supplements, and masculinity. Through a comprehensive evaluation of state-of-the-art vision language models, we achieve 82.9% accuracy on primary category classification and 69.0% on subcategory detection via domain-specific finetuning. Our ablation studies demonstrate that multimodal fusion improves performance by 5-10% over text-only approaches, with video features providing the most discriminative signals. These findings establish new benchmarks for multimodal harmful content detection and provide both the computational tools and methodological framework needed for scalable content moderation in specialized mental health domains.

24 Nov 2022

computer-science artificial-intelligence computation-and-language

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

University of Southern California USC Information Sciences Institute

Current state-of-the-art vision-and-language models are evaluated on tasks either individually or in a multi-task setting, overlooking the challenges of continually learning (CL) tasks as they arrive. Existing CL benchmarks have facilitated research on task adaptation and mitigating "catastrophic forgetting", but are limited to vision-only and language-only tasks. We present CLiMB, a benchmark to study the challenge of learning multimodal tasks in a CL setting, and to systematically evaluate how upstream continual learning can rapidly generalize to new multimodal and unimodal tasks. CLiMB includes implementations of several CL algorithms and a modified Vision-Language Transformer (ViLT) model that can be deployed on both multimodal and unimodal tasks. We find that common CL methods can help mitigate forgetting during multimodal task learning, but do not enable cross-task knowledge transfer. We envision that CLiMB will facilitate research on a new class of CL algorithms for this challenging multimodal setting.

20 Mar 2025

causal-inference computer-science artificial-intelligence

Estimating Causal Effects of Text Interventions Leveraging LLMs

USC Information Sciences Institute

Quantifying the effects of textual interventions in social systems, such as reducing anger in social media posts to see its impact on engagement, is challenging. Real-world interventions are often infeasible, necessitating reliance on observational data. Traditional causal inference methods, typically designed for binary or discrete treatments, are inadequate for handling the complex, high-dimensional textual data. This paper addresses these challenges by proposing CausalDANN, a novel approach to estimate causal effects using text transformations facilitated by large language models (LLMs). Unlike existing methods, our approach accommodates arbitrary textual interventions and leverages text-level classifiers with domain adaptation ability to produce robust effect estimates against domain shifts, even when only the control group is observed. This flexibility in handling various text interventions is a key advancement in causal estimation for textual data, offering opportunities to better understand human behaviors and develop effective interventions within social systems.

02 Jun 2021

adversarial-attacks computer-science computation-and-language

WARP: Word-level Adversarial ReProgramming

Yerevan State University YerevaNN USC Information Sciences Institute

Transfer learning from pretrained language models recently became the dominant approach for solving many NLP tasks. A common approach to transfer learning for multiple tasks that maximize parameter sharing trains one or more task-specific layers on top of the language model. In this paper, we present an alternative approach based on adversarial reprogramming, which extends earlier work on automatic prompt generation. Adversarial reprogramming attempts to learn task-specific word embeddings that, when concatenated to the input text, instruct the language model to solve the specified task. Using up to 25K trainable parameters per task, this approach outperforms all existing methods with up to 25M trainable parameters on the public leaderboard of the GLUE benchmark. Our method, initialized with task-specific human-readable prompts, also works in a few-shot setting, outperforming GPT-3 on two SuperGLUE tasks with just 32 training samples.

27 Apr 2025

computer-science cryptography-and-security

BinPool: A Dataset of Vulnerabilities for Binary Security Analysis

University of Southern California

Dartmouth College University of Thessaly USC Information Sciences Institute

The development of machine learning techniques for discovering software vulnerabilities relies fundamentally on the availability of appropriate datasets. The ideal dataset consists of a large and diverse collection of real-world vulnerabilities, paired so as to contain both vulnerable and patched versions of each program. Naturally, collecting such datasets is a laborious and time-consuming task. Within the specific domain of vulnerability discovery in binary code, previous datasets are either publicly unavailable, lack semantic diversity, involve artificially introduced vulnerabilities, or were collected using static analyzers, thereby themselves containing incorrectly labeled example programs. In this paper, we describe a new publicly available dataset which we dubbed Binpool, containing numerous samples of vulnerable versions of Debian packages across the years. The dataset was automatically curated, and contains both vulnerable and patched versions of each program, compiled at four different optimization levels. Overall, the dataset covers 603 distinct CVEs across 89 CWE classes, 162 Debian packages, and contains 6144 binaries. We argue that this dataset is suitable for evaluating a range of security analysis tools, including for vulnerability discovery, binary function similarity, and plagiarism detection.

16 Apr 2024

computer-science computers-and-society digital-libraries

IsamasRed: A Public Dataset Tracking Reddit Discussions on Israel-Hamas Conflict

University of Southern California USC Information Sciences Institute

The conflict between Israel and Palestinians significantly escalated after the October 7, 2023 Hamas attack, capturing global attention. To understand the public discourse on this conflict, we present a meticulously compiled dataset-IsamasRed-comprising nearly 400,000 conversations and over 8 million comments from Reddit, spanning from August 2023 to November 2023. We introduce an innovative keyword extraction framework leveraging a large language model to effectively identify pertinent keywords, ensuring a comprehensive data collection. Our initial analysis on the dataset, examining topics, controversy, emotional and moral language trends over time, highlights the emotionally charged and complex nature of the discourse. This dataset aims to enrich the understanding of online discussions, shedding light on the complex interplay between ideology, sentiment, and community engagement in digital spaces.

29 Feb 2024

computer-science computer-vision-and-pattern-recognition machine-learning

ManiFPT: Defining and Analyzing Fingerprints of Generative Models

USC Information Sciences Institute USC Ming Hsieh Department of Electrical and Computer Engineering

Researchers at USC ISI introduce "ManiFPT," a formal framework defining generative model "artifacts" as deviations from the true data manifold and "fingerprints" as collections of these artifacts. The method achieves state-of-the-art accuracy in attributing generated images to their source models and reveals how architectural choices influence unique model characteristics.

05 May 2025

computer-science social-and-information-networks

A longitudinal analysis of misinformation, polarization and toxicity on Bluesky after its public launch

Indiana University Politecnico di Milano SUPSI USC Information Sciences Institute

Bluesky is a decentralized, Twitter-like social media platform that has rapidly gained popularity. Following an invite-only phase, it officially opened to the public on February 6th, 2024, leading to a significant expansion of its user base. In this paper, we present a longitudinal analysis of user activity in the two months surrounding its public launch, examining how the platform evolved due to this rapid growth. Our analysis reveals that Bluesky exhibits an activity distribution comparable to more established social platforms, yet it features a higher volume of original content relative to reshared posts and maintains low toxicity levels. We further investigate the political leanings of its user base, misinformation dynamics, and engagement in harmful conversations. Our findings indicate that Bluesky users predominantly lean left politically and tend to share high-credibility sources. After the platform's public launch, an influx of new users, particularly those posting in English and Japanese, contributed to a surge in activity. Among them, several accounts displayed suspicious behaviors, such as mass-following users and sharing content from low-credibility news sources. Some of these accounts have already been flagged as spam or suspended, suggesting that Bluesky's moderation efforts have been effective.

18 Feb 2018

adversarial-robustness computer-science artificial-intelligence

Deep Neural Networks for Bot Detection

Indian Institute of Technology Hyderabad USC Information Sciences Institute

The problem of detecting bots, automated social media accounts governed by software but disguising as human users, has strong implications. For example, bots have been used to sway political elections by distorting online discourse, to manipulate the stock market, or to push anti-vaccine conspiracy theories that caused health epidemics. Most techniques proposed to date detect bots at the account level, by processing large amount of social media posts, and leveraging information from network structure, temporal dynamics, sentiment analysis, etc. In this paper, we propose a deep neural network based on contextual long short-term memory (LSTM) architecture that exploits both content and metadata to detect bots at the tweet level: contextual features are extracted from user metadata and fed as auxiliary input to LSTM deep nets processing the tweet text. Another contribution that we make is proposing a technique based on synthetic minority oversampling to generate a large labeled dataset, suitable for deep nets training, from a minimal amount of labeled data (roughly 3,000 examples of sophisticated Twitter bots). We demonstrate that, from just one single tweet, our architecture can achieve high classification accuracy (AUC > 96%) in separating bots from humans. We apply the same architecture to account-level bot detection, achieving nearly perfect classification accuracy (AUC > 99%). Our system outperforms previous state of the art while leveraging a small and interpretable set of features yet requiring minimal training data.

04 Jul 2019

computer-science social-and-information-networks

Tracking Temporal Evolution of Graphs using Non-Timestamped Data

University of California, Irvine USC Information Sciences Institute Siemens Corporate Technology

Datasets to study the temporal evolution of graphs are scarce. To encourage the research of novel dynamic graph learning algorithms we introduce YoutubeGraph-Dyn (available at this https URL), an evolving graph dataset generated from YouTube real-world interactions. YoutubeGraph-Dyn provides intra-day time granularity (with 416 snapshots taken every 6 hours for a period of 104 days), multi-modal relationships that capture different aspects of the data, multiple attributes including timestamped, non-timestamped, word embeddings, and integers. Our data collection methodology emphasizes the creation of time evolving graphs from non-timestamped data. In this paper, we provide various graph statistics of YoutubeGraph-Dyn and test state-of-the-art graph clustering algorithms to detect community migration, and time series analysis and recurrent neural network algorithms to forecast non-timestamped data.

126

22 Oct 2024

computer-science computation-and-language data-curation

COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities

USC Information Sciences Institute

Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable the creating of computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community's online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune a foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method's utility in accurately representing political and diet communities on Reddit. Unlike prior methods requiring human-authored instructions, Community-Cross-Instruct generates instructions in a fully unsupervised manner, enhancing scalability and generalization across domains. This work enables cost-effective and automated surveying of diverse online communities.

30 Jul 2024

causal-inference computer-science machine-learning

DiffusionCounterfactuals: Inferring High-dimensional Counterfactuals with Guidance of Causal Representations

Clemson University USC Information Sciences Institute USC Ming Hsieh Department of Electrical and Computer Engineering USC Thomas Lord Department of Computer Science

DiffusionCounterfactuals integrates the generative capabilities of diffusion models with principled causal reasoning to infer high-dimensional counterfactuals. The method generates visually realistic images that accurately reflect the effects of specific causal interventions, showing improved performance in both image quality and causal consistency compared to existing approaches.

10 Aug 2023

computer-science computer-vision-security computer-vision-and-pattern-recognition

TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization

USC Information Sciences Institute

The evaluation datasets and metrics for image manipulation detection and localization (IMDL) research have been standardized. But the training dataset for such a task is still nonstandard. Previous researchers have used unconventional and deviating datasets to train neural networks for detecting image forgeries and localizing pixel maps of manipulated regions. For a fair comparison, the training set, test set, and evaluation metrics should be persistent. Hence, comparing the existing methods may not seem fair as the results depend heavily on the training datasets as well as the model architecture. Moreover, none of the previous works release the synthetic training dataset used for the IMDL task. We propose a standardized benchmark training dataset for image splicing, copy-move forgery, removal forgery, and image enhancement forgery. Furthermore, we identify the problems with the existing IMDL datasets and propose the required modifications. We also train the state-of-the-art IMDL methods on our proposed TrainFors1 dataset for a fair evaluation and report the actual performance of these methods under similar conditions.

30 Aug 2021

ai-for-cybersecurity computer-science computer-vision-security

BioFors: A Large Biomedical Image Forensics Dataset

USC Information Sciences Institute

Research in media forensics has gained traction to combat the spread of misinformation. However, most of this research has been directed towards content generated on social media. Biomedical image forensics is a related problem, where manipulation or misuse of images reported in biomedical research documents is of serious concern. The problem has failed to gain momentum beyond an academic discussion due to an absence of benchmark datasets and standardized tasks. In this paper we present BioFors -- the first dataset for benchmarking common biomedical image manipulations. BioFors comprises 47,805 images extracted from 1,031 open-source research papers. Images in BioFors are divided into four categories -- Microscopy, Blot/Gel, FACS and Macroscopy. We also propose three tasks for forensic analysis -- external duplication detection, internal duplication detection and cut/sharp-transition detection. We benchmark BioFors on all tasks with suitable state-of-the-art algorithms. Our results and analysis show that existing algorithms developed on common computer vision datasets are not robust when applied to biomedical images, validating that more research is required to address the unique challenges of biomedical image forensics.

118

11 Feb 2025

computer-science computation-and-language computers-and-society

Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities

USC Information Sciences Institute

Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation and broader applications in public health and social science research.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Context Length Alone Hurts LLM Performance Despite Perfect Retrieval

Predicting and Understanding College Student Mental Health with Interpretable Machine Learning

Illusions of Intimacy: Emotional Attachment and Emerging Psychological Risks in Human-AI Relationships

LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation

Style-Aware Normalized Loss for Improving Arbitrary Style Transfer

BigTokDetect: A Clinically-Informed Vision-Language Modeling Framework for Detecting Pro-Bigorexia Videos on TikTok

CLiMB: A Continual Learning Benchmark for Vision-and-Language Tasks

Estimating Causal Effects of Text Interventions Leveraging LLMs

WARP: Word-level Adversarial ReProgramming

BinPool: A Dataset of Vulnerabilities for Binary Security Analysis

IsamasRed: A Public Dataset Tracking Reddit Discussions on Israel-Hamas Conflict

ManiFPT: Defining and Analyzing Fingerprints of Generative Models

A longitudinal analysis of misinformation, polarization and toxicity on Bluesky after its public launch

Deep Neural Networks for Bot Detection

Tracking Temporal Evolution of Graphs using Non-Timestamped Data

COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities

DiffusionCounterfactuals: Inferring High-dimensional Counterfactuals with Guidance of Causal Representations

TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization

BioFors: A Large Biomedical Image Forensics Dataset

Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities

Events

AI for Law

Personalize Your Feed