University of Guelph
AI agents are vulnerable to indirect prompt injection attacks, where malicious instructions embedded in external content or tool outputs cause unintended or harmful behavior. Inspired by the well-established concept of firewalls, we show that a simple, modular and model-agnostic defense operating at the agent--tool interface achieves perfect security (0% or the lowest possible attack success rate) with high utility (task success rate) across four public benchmarks: AgentDojo, Agent Security Bench, InjecAgent and tau-Bench, while achieving a state-of-the-art security-utility tradeoff compared to prior results. Specifically, we employ a defense based on two firewalls: a Tool-Input Firewall (Minimizer) and a Tool-Output Firewall (Sanitizer). Unlike prior complex approaches, this firewall defense makes minimal assumptions on the agent and can be deployed out-of-the-box, while maintaining strong performance without compromising utility. However, our analysis also reveals critical limitations in these existing benchmarks, including flawed success metrics, implementation bugs, and most importantly, weak attacks, hindering significant progress in the field. To foster more meaningful progress, we present targeted fixes to these issues for AgentDojo and Agent Security Bench while proposing best-practices for more robust benchmark design. Further, we demonstrate that although these firewalls push the state-of-the-art on existing benchmarks, it is still possible to bypass them in practice, underscoring the need to incorporate stronger attacks in security benchmarks. Overall, our work shows that existing agentic security benchmarks are easily saturated by a simple approach and highlights the need for stronger agentic security benchmarks with carefully chosen evaluation metrics and strong adaptive attacks.
Convolutional neural networks are capable of learning powerful representational spaces, which are necessary for tackling complex learning tasks. However, due to the model capacity required to capture such representations, they are often susceptible to overfitting and therefore require proper regularization in order to generalize well. In this paper, we show that the simple regularization technique of randomly masking out square regions of input during training, which we call cutout, can be used to improve the robustness and overall performance of convolutional neural networks. Not only is this method extremely easy to implement, but we also demonstrate that it can be used in conjunction with existing forms of data augmentation and other regularizers to further improve model performance. We evaluate this method by applying it to current state-of-the-art architectures on the CIFAR-10, CIFAR-100, and SVHN datasets, yielding new state-of-the-art results of 2.56%, 15.20%, and 1.30% test error respectively. Code is available at this https URL
4
We present a quantum simulation framework universally applicable to a wide class of quantum systems, including quantum field theories such as quantum chromodynamics (QCD). Specifically, we generalize an efficient quantum simulation protocol developed for bosonic theories in [Halimeh et al., arXiv:2411.13161] which, when applied to Yang-Mills theory, demonstrated an exponential resource advantage with respect to the truncation level of the bosonic modes, to systems with both bosons and fermions using the Jordan-Wigner transform and also the Verstraete-Cirac transform. We apply this framework to QCD using the orbifold lattice formulation and achieve an exponential speedup compared to previous proposals. As a by-product, exponential speedup is achieved in the quantum simulation of the Kogut-Susskind Hamiltonian, the latter being a special limit of the orbifold lattice Hamiltonian. In the case of Hamiltonian time evolution of a theory on an LdL^d spatial lattice via Trotterization, one Trotter step can be realized using O(Ld)\mathcal{O}(L^d) numbers of CNOT gates, Hadamard gates, phase gates, and one-qubit rotations. We show this analytically for any matter content and SU(N)\mathrm{SU}(N) gauge group with any NN. Even when we use the Jordan-Wigner transform, we can utilize the cancellation of quantum gates to significantly simplify the quantum circuit. We also discuss a block encoding of the Hamiltonian as a linear combination of unitaries using the Verstraete-Cirac transform. Our protocols do not assume oracles, but rather present explicit constructions with rigorous resource estimations without a hidden cost, and are thus readily implementable on a quantum computer.
Large Language Models (LLMs) are often proposed as tools to streamline clinical documentation, a task viewed as both high-volume and low-risk. However, even seemingly straightforward applications of LLMs raise complex sociotechnical considerations to translate into practice. This case study, conducted at KidsAbility, a pediatric rehabilitation facility in Ontario, Canada examined the use of LLMs to support occupational therapists in reducing documentation this http URL conducted a qualitative study involving 20 clinicians who participated in pilot programs using two AI technologies: a general-purpose proprietary LLM and a bespoke model fine-tuned on proprietary historical documentation. Our findings reveal that documentation challenges are sociotechnical in nature, shaped by clinical workflows, organizational policies, and system constraints. Four key themes emerged: (1) the heterogeneity of workflows, (2) the documentation burden is systemic and not directly linked to the creation of any single type of documentation, (3) the need for flexible tools and clinician autonomy, and (4) effective implementation requires mutual learning between clinicians and AI systems. While LLMs show promise in easing documentation tasks, their success will depend on flexible, adaptive integration that supports clinician autonomy. Beyond technical performance, sustained adoption will require training programs and implementation strategies that reflect the complexity of clinical environments.
A learning framework, ArcPro, generates structured 3D abstractions of buildings from sparse and imperfect point clouds using architectural programs. It employs procedural generation for data synthesis and a neural encoder-decoder, achieving higher accuracy and robustness in geometric reconstruction than prior methods.
Hamiltonian quantum simulation of bosons on digital quantum computers requires truncating the Hilbert space to finite dimensions. The method of truncation and the choice of basis states can significantly impact the complexity of the quantum circuit required to simulate the system. For example, a truncation in the Fock basis where each boson is encoded with a register of QQ qubits, can result in an exponentially large number of Pauli strings required to decompose the truncated Hamiltonian. This, in turn, can lead to an exponential increase in QQ in the complexity of the quantum circuit. For lattice quantum field theories such as Yang-Mills theory and QCD, several Hamiltonian formulations and corresponding truncations have been put forward in recent years. There is no exponential increase in QQ when resorting to the orbifold lattice Hamiltonian, while we do not know how to remove the exponential complexity in QQ in the commonly used Kogut-Susskind Hamiltonian. Specifically, when using the orbifold lattice Hamiltonian, the continuum limit, or, in other words, the removal of the ultraviolet energy cutoff, is obtained with circuits whose resources scale like QQ, while they scale like O(exp(Q))\mathcal{O}(\exp(Q)) for the Kogut-Susskind Hamiltonian: this can be seen as an exponential speed up in approaching the physical continuum limit for the orbifold lattice Hamiltonian formulation. We show that the universal framework, advocated by three of the authors (M.~H., S.~M., and E.~R.) and collaborators, provides a natural avenue to solve the exponential scaling of circuit complexity with QQ, and it is the reason why using the orbifold lattice Hamiltonian is advantageous.
Agglomerative Token Clustering (ATC) introduces a parameter-free, hard-merging approach for Vision Transformers based on bottom-up hierarchical clustering, improving efficiency by reducing token count. The method consistently outperforms previous token reduction techniques across image classification, synthesis, and object detection/segmentation, demonstrating performance gains up to 10.5 percentage points in MAE-pretrained models and achieving better FID scores and mAP, particularly at aggressive token reduction rates.
13
Escalating urban heat, driven by the convergence of global warming and rapid urbanization, is a profound threat to billions of city dwellers. The science directing urban heat adaptation is strongly influenced by studies that use satellite-based land surface temperature (LST), which is readily available globally and address data gaps in cities, particularly in the Global South. LST, however, is a poor surrogate for near-surface air temperature, physiologically relevant human thermal comfort, or direct human heat exposure. This flawed practice leads to issues for several downstream use cases by inflating adaptation benefits, distorting the magnitude and variability of urban heat signals across scales, and thus misguiding urban adaptation policy. We argue that satellite-based LST must be treated as a distinct indicator of surface climate, which, though relevant to the urban surface energy budget, can be frequently decoupled from human-relevant thermal impacts especially during daytime. Only by a disciplined application of this variable, combined with complementary datasets, process-based and data-driven models, as well as interdisciplinary collaboration, can urban adaptation design and policy be effectively advanced.
This research systematically compared ten token reduction methods in Vision Transformers, establishing that simple pruning strategies like Top-K and EViT are highly competitive. The study also revealed that token reduction patterns are surprisingly consistent across diverse datasets and that pattern similarity correlates with model performance.
33
We introduce and make publicly available the NIFTY Financial News Headlines dataset, designed to facilitate and advance research in financial market forecasting using large language models (LLMs). This dataset comprises two distinct versions tailored for different modeling approaches: (i) NIFTY-LM, which targets supervised fine-tuning (SFT) of LLMs with an auto-regressive, causal language-modeling objective, and (ii) NIFTY-RL, formatted specifically for alignment methods (like reinforcement learning from human feedback (RLHF)) to align LLMs via rejection sampling and reward modeling. Each dataset version provides curated, high-quality data incorporating comprehensive metadata, market indices, and deduplicated financial news headlines systematically filtered and ranked to suit modern LLM frameworks. We also include experiments demonstrating some applications of the dataset in tasks like stock price movement and the role of LLM embeddings in information acquisition/richness. The NIFTY dataset along with utilities (like truncating prompt's context length systematically) are available on Hugging Face at this https URL.
This work provides a comprehensive survey and taxonomy of benchmarks for Multimodal Large Language Models (MLLMs) across eight specialized disciplines, revealing a persistent performance gap between generalist MLLMs and human experts in tasks requiring deep domain knowledge and robust multimodal reasoning. It compiles and categorizes these benchmarks to highlight current capabilities, limitations, and areas requiring further research to bridge the "last mile problem" in real-world applications.
Observations of gravitational-wave signals emitted by compact binary inspirals provide unique insights into their properties, but their analysis requires accurate and efficient waveform models. Intermediate- and extreme-mass-ratio inspirals (I/EMRIs), with mass ratios q102q \gtrsim 10^2, are promising sources for future detectors such as the Laser Interferometer Space Antenna (LISA). Modelling waveforms for these asymmetric-mass binaries is challenging, entailing the tracking of many harmonic modes over thousands to millions of cycles. The FastEMRIWaveforms (FEW) modelling framework addresses this need, leveraging precomputation of mode data and interpolation to rapidly compute adiabatic waveforms for eccentric inspirals into zero-spin black holes. In this work, we extend FEW to model eccentric equatorial inspirals into black holes with spin magnitudes a0.999|a| \leq 0.999. Our model supports eccentricities e < 0.9 and semi-latus recta p < 200, enabling the generation of long-duration IMRI waveforms, and produces waveforms in 100\sim 100 ms with hardware acceleration. Characterising systematic errors, we estimate that our model attains mismatches of 105\sim 10^{-5} (for LISA sensitivity) with respect to error-free adiabatic waveforms over most of parameter space. We find that kludge models introduce errors in signal-to-noise ratios (SNRs) as great as 40%+60%^{+60\%}_{-40\%} and induce marginal biases of up to 1σ\sim 1\sigma in parameter estimation. We show LISA's horizon redshift for I/EMRI signals varies significantly with aa, reaching a redshift of 33 (1515) for EMRIs (IMRIs) with only minor (10%)(\sim10\%) dependence on ee for an SNR threshold of 20. For signals with SNR 50\sim 50, spin and eccentricity-at-plunge are measured with uncertainties of δa107\delta a \sim 10^{-7} and δef105\delta e_f \sim 10^{-5}. This work advances the state-of-the-art in waveform generation for asymmetric-mass binaries.
We provide a universal framework for the quantum simulation of SU(N) Yang--Mills theories on fault-tolerant digital quantum computers adopting the orbifold lattice formulation. As warm-up examples, we also consider simple models, including scalar field theory and the Yang--Mills matrix model, to illustrate the universality of our formulation, which shows up in the fact that the truncated Hamiltonian can be expressed in the same simple form for any N, any dimension, and any lattice size, in stark contrast to the popular approach based on the Kogut--Susskind formulation. In all these cases, the truncated Hamiltonian can be programmed on a quantum computer using only standard tools well-established in the field of quantum computation. As a concrete application of this universal framework, we consider Hamiltonian time evolution by Suzuki--Trotter decomposition. This turns out to be a straightforward task due to the simplicity of the truncated Hamiltonian. We also provide a simple circuit structure that contains only CNOT and one-qubit gates, independent of the details of the theory investigated.
In this work, we develop the modified Teukolsky formalism that describes the GW radiation from a point mass orbiting around a perturbed Schwarzschild BH. This perturbation of the background spacetime induces a secular change in the orbital phase of the point mass. In turn, this causes a modification in the GW flux, which can be used to probe the background spacetime. We explicitly apply this formalism to a bumpy Schwarzschild spacetime as a proof of principle. The results pave the way for the description of EMRIs in generic perturbed Kerr spacetime in future developments.
The paper introduces SRigL, a dynamic sparse training method that learns hardware-friendly structured sparsity patterns from scratch. SRigL matches state-of-the-art unstructured dynamic sparse training methods in generalization performance while achieving significant real-world acceleration on both CPU (up to 3.4x faster for online inference) and GPU (up to 13.0x faster for batched inference).
17
Temporal Action Localization (TAL) involves localizing and classifying action snippets in an untrimmed video. The emergence of large video foundation models has led RGB-only video backbones to outperform previous methods needing both RGB and optical flow modalities. Leveraging these large models is often limited to training only the TAL head due to the prohibitively large GPU memory required to adapt the video backbone for TAL. To overcome this limitation, we introduce LoSA, the first memory-and-parameter-efficient backbone adapter designed specifically for TAL to handle untrimmed videos. LoSA specializes for TAL by introducing Long-Short-range Adapters that adapt the intermediate layers of the video backbone over different temporal ranges. These adapters run parallel to the video backbone to significantly reduce memory footprint. LoSA also includes Long-Short-range Gated Fusion that strategically combines the output of these adapters from the video backbone layers to enhance the video features provided to the TAL head. Experiments show that LoSA significantly outperforms all existing methods on standard TAL benchmarks, THUMOS-14 and ActivityNet-v1.3, by scaling end-to-end backbone adaptation to billion-parameter-plus models like VideoMAEv2~(ViT-g) and leveraging them beyond head-only transfer learning.
Models of coupled human-environment systems often face a tradeoff between realism and tractability. Spectrum opinion models, where social preferences vary continuously, offer descriptive richness but are computationally demanding and parameter-heavy. Binary formulations, in contrast, are analytically simpler but raise concerns about whether they can capture key socio-ecological feedbacks. Here we systematically compare binary and spectrum social models across four benchmark settings: (i) replicator dynamics coupled to a climate-carbon system, (ii) FJ opinion dynamics coupled to the climate-carbon system, (iii) replicator dynamics coupled to a forest-grassland ecological system, and (iv) FJ opinion dynamics coupled to a forest-grassland ecological system. We employ the relative integrated absolute error (RIAE) to quantify deviations between binary (N=2) and spectrum (N=100) formulations of social opinion dynamics in feedback with ecological subsystems. Across systematic parameter sweeps of learning rates, reluctance, conformity, susceptibility, runaway amplitudes, and ecological turnover, the binary formulation typically tracks its spectrum counterpart to within 15 percent for most parameter combinations. Deviations beyond this arise mainly under very high social susceptibility or near-vanishing ecological turnover, where additional opinion modes and nonlinear feedbacks matter. We therefore present the binary formulation as a practical surrogate, not a universal replacement. As a rule of thumb, it is adequate when susceptibility is moderate, ecological turnover appreciable, and runaway amplitudes not extreme; in high-susceptibility or low-turnover regimes, especially near critical transitions, the full-spectrum model is preferable. This framing guides readers on when a binary reduction is sufficient versus when full-spectrum detail is warranted.
The tidal deformation of a neutron star in a binary inspiral driven by the emission of gravitational waves affects the orbital dynamics and produces a measurable modulation of the waves. Late in the inspiral, a regime of dynamical tides takes over from a prior regime of static tides. A recent analysis by Yu et al. [M.N.R.A.S. 519, 4325 (2022)] reveals that nonlinear aspects of the tidal interaction are important during the regime of dynamical tides. Their theoretical framework is grounded in Newtonian gravity and fluid mechanics, and relies on a representation of the tidal deformation in terms of the star's normal modes of vibration. We confirm their observation in a general relativistic treatment of the tidal deformation of a neutron star, without relying on a mode representation of this deformation. The starting point of our description is a simultaneous time-derivative and nonlinear expansion of the tidal deformation, expressed in terms of three encapsulating constants, the static k2k_2, dynamic k¨2\ddot{k}_2, and nonlinear p2p_2 tidal constants. We describe the neutron star's deformation in terms of a well-defined quadrupole moment tensor, which is related to the tidal quadrupole moment through a frequency-domain response function k~2(ω)\tilde{k}_2(\omega). In a pragmatic extension of our simultaneous expansion, we express this in a form proportional to (1ω2/ω2)1(1-\omega^2/\omega_*^2)^{-1}, the characteristic response of a harmonic oscillator subjected to a driving force of frequency ω\omega, with a natural-frequency parameter ω\omega_* constructed from the tidal constants. We compute these for polytropic stellar models, and show that the nonlinear constant p2p_2 lowers the frequency parameter by as much as 15% relative to an estimation based on a purely linear treatment of the tidal deformation.
In this work, we study the dynamics of an extreme mass-ratio inspiral (EMRI) embedded within a scalar cloud populated around the massive black hole. This cloud may be generated through the black hole superradiant process if the wavelength of the scalar particle is comparable to the size of the massive black hole. The EMRI motion perturbs the cloud, producing scalar radiation towards infinity and into the black hole horizon. In addition, the backreaction of the scalar radiation onto the orbit modifies the motion of the EMRI and induces an observable gravitational-wave phase shift for a range of system parameters. We quantify the scalar flux and the induced phase shift, as one of the examples of exactly-solvable, environmental effects of EMRIs.
Taxonomic classification in biodiversity research involves organizing biological specimens into structured hierarchies based on evidence, which can come from multiple modalities such as images and genetic information. We investigate whether hyperbolic networks can provide a better embedding space for such hierarchical models. Our method embeds multimodal inputs into a shared hyperbolic space using contrastive and a novel stacked entailment-based objective. Experiments on the BIOSCAN-1M dataset show that hyperbolic embedding achieves competitive performance with Euclidean baselines, and outperforms all other models on unseen species classification using DNA barcodes. However, fine-grained classification and open-world generalization remain challenging. Our framework offers a structure-aware foundation for biodiversity modelling, with potential applications to species discovery, ecological monitoring, and conservation efforts.
There are no more papers matching your filters at the moment.