Newcastle University
This research systematically assesses the security of Hierarchical Federated Learning (HFL) across various architectural depths and attack types, revealing its robustness against untargeted training-time attacks but significant vulnerability to targeted backdoor attacks, particularly when malicious clients are strategically located in overlapping coverage areas. It demonstrates that adversarial training and Neural Cleanse are effective defense mechanisms for specific adversarial threats in HFL.
· +1
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods, and sociotechnical challenges. Based on the identified challenges, we pose 200+200+ concrete research questions.
National Physical Laboratory-led research comprehensively reviews and categorizes over 30 metrics and benchmarks for quantum computers, including non-gate-based architectures, providing consistent definitions and linking them to open-source software for reproducibility. This work establishes a structured framework for evaluating quantum hardware performance and proposes a roadmap for international standardization.
Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel' gradient discrepancy (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel characterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed and studied, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and predictively oriented posteriors presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.
Infographic charts are a powerful medium for communicating abstract data by combining visual elements (e.g., charts, images) with textual information. However, their visual and structural richness poses challenges for large vision-language models (LVLMs), which are typically trained on plain charts. To bridge this gap, we introduce ChartGalaxy, a million-scale dataset designed to advance the understanding and generation of infographic charts. The dataset is constructed through an inductive process that identifies 75 chart types, 440 chart variations, and 68 layout templates from real infographic charts and uses them to create synthetic ones programmatically. We showcase the utility of this dataset through: 1) improving infographic chart understanding via fine-tuning, 2) benchmarking code generation for infographic charts, and 3) enabling example-based infographic chart generation. By capturing the visual and structural complexity of real design, ChartGalaxy provides a useful resource for enhancing multimodal reasoning and generation in LVLMs.
220
Weak gravitational lensing observations of galaxy clusters are sensitive to all mass along the line-of-sight, introducing systematic and additional statistical uncertainties due to intervening structures. We quantify their impact on the recovery of mass density profile parameters using 967 clusters from the highest-resolution FLAMINGO simulation. We construct mock weak lensing maps, including both single source plane mocks and Euclid-like mocks with a realistic source redshift distribution. Applying Bayesian inference with Nautilus, we fit spherical and elliptical Navarro-Frenk-White models to recover the cluster mass, concentration, axis ratio, and centre, which we use to measure the brightest cluster galaxy (BCG) offset from the potential centre, or `BCG wobble'. We find that the spherical model fits clusters along under-dense sight-lines better than those along over-dense ones. This introduces a positive skew in the relative error distributions for mass and concentration, which increases with source redshift. In Euclid-like mocks, this results in a mean mass bias of +5.3±1.4+5.3\pm1.4% (significant at 3.5σ3.5\sigma) when assuming a spherical NFW model. We also detect a mean axis ratio bias of 2.0±0.7-2.0\pm0.7% (2.9σ2.9\sigma), with no significant bias in concentration. We measure a BCG wobble of ~14 kpc in our Euclid-like mocks, with negligible contribution from line-of-sight structure. Furthermore, we predict the scatter in mass estimates from future weak lensing surveys that have mean source redshifts zs1.2z_\text s \gtrsim 1.2 (such as the Nancy Grace Roman Space Telescope), will be dominated by line-of-sight structure and hence assuming a diagonal covariance matrix will lead to overestimating the precision. We conclude that cluster weak lensing pipelines should be calibrated on simulations with lightcone data in order to properly account for the significant impact of line-of-sight structure.
MACE-OFF introduces a new family of purely short-range machine learning force fields based on the MACE architecture, achieving near-DFT accuracy for diverse organic molecules, molecular crystals, liquids, and biomolecular systems. The models demonstrate high transferability, accurately reproducing free energy surfaces, IR spectra, and enabling stable simulations of solvated proteins at force field speeds.
InstanSeg, an embedding-based instance segmentation algorithm, was developed to accurately and efficiently identify cells and nuclei in microscopy images. It consistently achieved state-of-the-art accuracy while dramatically reducing processing time and enabling broad portability through TorchScript compilation and QuPath integration.
This paper aims to gather together some of the basic ideas behind the theory of false vacuum decay in quantum Ising models, focusing on the application of spin chains as analogue systems to false vacuum decay in elementary particle theory. Elementary results on quantum Ising models are rewritten to more closely resemble the original literature on false vacuum decay. A highly speculative conjecture for the false vacuum decay rate in a two dimensional quantum Ising model is also put forward.
Flexible slender structures such as rods, ribbons, plates, and shells exhibit extreme nonlinear responses bending, twisting, buckling, wrinkling, and self contact, that defy conventional simulation frameworks. Discrete Differential Geometry (DDG) has emerged as a geometry first, structure preserving paradigm for modeling such behaviors. Unlike finite element or mass spring methods, DDG discretizes geometry rather than governing equations, allowing curvature, twist, and strain to be defined directly on meshes. This approach yields robust large deformation dynamics, accurate handling of contact, and differentiability essential for inverse design and learning based control. This review consolidates the rapidly expanding landscape of DDG models across 1D and 2D systems, including discrete elastic rods, ribbons, plates, and shells, as well as multiphysics extensions to contact, magnetic actuation, and fluid structure interaction. We synthesize applications spanning mechanics of nonlinear instabilities, biological morphogenesis, functional structures and devices, and robotics from manipulation to soft machines. Compared with established approaches, DDG offers a unique balance of geometric fidelity, computational efficiency, and algorithmic differentiability, bridging continuum rigor with real time, contact rich performance. We conclude by outlining opportunities for multiphysics coupling, hybrid physics data pipelines, and scalable GPU accelerated solvers, and by emphasizing DDG role in enabling digital twins, sim to real transfer, and intelligent design of next generation flexible systems.
Deep neural networks based purely on attention have been successful across several domains, relying on minimal architectural priors from the designer. In Human Action Recognition (HAR), attention mechanisms have been primarily adopted on top of standard convolutional or recurrent layers, improving the overall generalization capability. In this work, we introduce Action Transformer (AcT), a simple, fully self-attentional architecture that consistently outperforms more elaborated networks that mix convolutional, recurrent and attentive layers. In order to limit computational and energy requests, building on previous human action recognition research, the proposed approach exploits 2D pose representations over small temporal windows, providing a low latency solution for accurate and effective real-time performance. Moreover, we open-source MPOSE2021, a new large-scale dataset, as an attempt to build a formal training and evaluation benchmark for real-time, short-time HAR. The proposed methodology was extensively tested on MPOSE2021 and compared to several state-of-the-art architectures, proving the effectiveness of the AcT model and laying the foundations for future work on HAR.
79
This survey offers a comprehensive review of text-to-video generation, examining the field through the lens of OpenAI's Sora model. It details the advancements in video duration and resolution, while systematically identifying persistent challenges related to physical realism, object consistency, and complex scene interactions.
Creating aesthetically pleasing data visualizations remains challenging for users without design expertise or familiarity with visualization tools. To address this gap, we present DataWink, a system that enables users to create custom visualizations by adapting high-quality examples. Our approach combines large multimodal models (LMMs) to extract data encoding from existing SVG-based visualization examples, featuring an intermediate representation of visualizations that bridges primitive SVG and visualization programs. Users may express adaptation goals to a conversational agent and control the visual appearance through widgets generated on demand. With an interactive interface, users can modify both data mappings and visual design elements while maintaining the original visualization's aesthetic quality. To evaluate DataWink, we conduct a user study (N=12) with replication and free-form exploration tasks. As a result, DataWink is recognized for its learnability and effectiveness in personalized authoring tasks. Our results demonstrate the potential of example-driven approaches for democratizing visualization creation.
Accurate pulsar astrometric estimates play an essential role in almost all high-precision pulsar timing experiments. Traditional pulsar timing techniques refine these estimates by including them as free parameters when fitting a model to observed pulse time-of-arrival measurements. However, reliable sub-milliarcsecond astrometric estimations require years of observations and, even then, power from red noise can be inadvertently absorbed into astrometric parameter fits, biasing the resulting estimations and reducing our sensitivity to red noise processes, including gravitational waves (GWs). In this work, we seek to mitigate these shortcomings by using pulsar astrometric estimates derived from Very Long Baseline Interferometry (VLBI) as priors for the timing fit. First, we calibrated a frame tie to account for the offsets between the reference frames used in VLBI and timing. Then, we used the VLBI-informed priors and timing-based likelihoods of several astrometric solutions consistent with both techniques to obtain a maximum-posterior astrometric solution. We found offsets between our results and the timing-based astrometric solutions, which, if real, would lead to absorption of spectral power at frequencies of interest for single-source GW searches. However, we do not find significant power absorption due to astrometric fitting at the low-frequency domain of the GW background.
We propose a resilience-based framework for computing feasible assume-guarantee contracts that ensure the satisfaction of temporal specifications in interconnected discrete-time systems. Interconnection effects are modeled as structured disturbances. We use a resilience metric, the maximum disturbance under which local specifications hold, to refine assumptions and guarantees across subsystems iteratively. For two subsystems, we demonstrate correctness, monotone refinement of guarantees, and that the resulting assumptions are maximal within ball-shaped sets. Additionally, we extend our approach to general networks of L subsystems using weighted combinations of interconnection effects. We instantiate the framework on linear systems by meeting finite-horizon safety, exact-time reachability, and finite-time reachability specifications, and on nonlinear systems by fulfilling general finite-horizon specifications. Our approach is demonstrated through numerical linear examples, and a nonlinear DC Microgrid case study, showcasing the impact of our framework in verifying temporal logic specifications with compositional reasoning.
A multi-institution research team develops a three-part framework for AI image provenance detection that combines adversarially robust perceptual hashing (DinoHash), multi-party fully homomorphic encryption, and AI detection models, achieving a 12% improvement in hash accuracy and 25% better classification performance while preserving privacy during provenance lookups.
14
Sampling algorithms drive probabilistic machine learning, and recent years have seen an explosion in the diversity of tools for this task. However, the increasing sophistication of sampling algorithms is correlated with an increase in the tuning burden. There is now a greater need than ever to treat the tuning of samplers as a learning task in its own right. In a conceptual breakthrough, Wang et al (2025) formulated Metropolis-Hastings as a Markov decision process, opening up the possibility for adaptive tuning using Reinforcement Learning (RL). Their emphasis was on theoretical foundations; realising the practical benefit of Reinforcement Learning Metropolis-Hastings (RLMH) was left for subsequent work. The purpose of this paper is twofold: First, we observe the surprising result that natural choices of reward, such as the acceptance rate, or the expected squared jump distance, provide insufficient signal for training RLMH. Instead, we propose a novel reward based on the contrastive divergence, whose superior performance in the context of RLMH is demonstrated. Second, we explore the potential of RLMH and present adaptive gradient-based samplers that balance flexibility of the Markov transition kernel with learnability of the associated RL task. A comprehensive simulation study using the posteriordb benchmark supports the practical effectiveness of RLMH.
Small-scale winds driven from accretion discs surrounding active galactic nuclei (AGN) are expected to launch kpc-scale outflows into their host galaxies. However, the ways in which the structure of the interstellar medium (ISM) affects the multiphase content and impact of the outflow remains uncertain. We present a series of numerical experiments featuring a realistic small-scale AGN wind with velocity 5×103104 km/s5\times 10^3-10^4\ \rm{km/s} interacting with an isolated galaxy disc with a manually-controlled clumpy ISM, followed at sub-pc resolution. Our simulations are performed with AREPO and probe a wide range of AGN luminosities (L=104347 erg/sL=10^{43-47}\ \rm{erg/s}) and ISM substructures. In homogeneous discs, the AGN wind sweeps up an outflowing, cooling shell, where the emerging cold phase dominates the mass and kinetic energy budgets, reaching a momentum flux p˙7 L/c\dot{p} \approx 7\ L/c. However, when the ISM is clumpy, outflow properties are profoundly different. They contain small, long-lived (> 5\ \rm{Myr}), cold (T<10^{4.5}\ \rm{K}) cloudlets entrained in the faster, hot outflow phase, which are only present in the outflow if radiative cooling is included in the simulation. While the cold phase dominates the mass of the outflow, most of the kinetic luminosity is now carried by a tenuous, hot phase with T > 10^7 \ \rm K. While the hot phases reaches momentum fluxes p˙(15) L/c\dot{p} \approx (1 - 5)\ L/c, energy-driven bubbles couple to the cold phase inefficiently, producing modest momentum fluxes \dot{p} < L/c in the fast-outflowing cold gas. These low momentum fluxes could lead to the outflows being misclassified as momentum-driven using common observational diagnostics. We also show predictions for scaling relations between outflow properties and AGN luminosity and discuss the challenges in constraining outflow driving mechanisms and kinetic coupling efficiencies using observed quantities.
In recent years, text generation tools utilizing Artificial Intelligence (AI) have occasionally been misused across various domains, such as generating student reports or creative writings. This issue prompts plagiarism detection services to enhance their capabilities in identifying AI-generated content. Adversarial attacks are often used to test the robustness of AI-text generated detectors. This work proposes a novel textual adversarial attack on the detection models such as Fast-DetectGPT. The method employs embedding models for data perturbation, aiming at reconstructing the AI generated texts to reduce the likelihood of detection of the true origin of the texts. Specifically, we employ different embedding techniques, including the Tsetlin Machine (TM), an interpretable approach in machine learning for this purpose. By combining synonyms and embedding similarity vectors, we demonstrates the state-of-the-art reduction in detection scores against Fast-DetectGPT. Particularly, in the XSum dataset, the detection score decreased from 0.4431 to 0.2744 AUROC, and in the SQuAD dataset, it dropped from 0.5068 to 0.3532 AUROC.
There are no more papers matching your filters at the moment.