CETINIAUniversidad Rey Juan Carlos
In text classification, creating an adversarial example means subtly perturbing a few words in a sentence without changing its meaning, causing it to be misclassified by a classifier. A concerning observation is that a significant portion of adversarial examples generated by existing methods change only one word. This single-word perturbation vulnerability represents a significant weakness in classifiers, which malicious users can exploit to efficiently create a multitude of adversarial examples. This paper studies this problem and makes the following key contributions: (1) We introduce a novel metric ρ\rho to quantitatively assess a classifier's robustness against single-word perturbation. (2) We present the SP-Attack, designed to exploit the single-word perturbation vulnerability, achieving a higher attack success rate, better preserving sentence meaning, while reducing computation costs compared to state-of-the-art adversarial methods. (3) We propose SP-Defense, which aims to improve \r{ho} by applying data augmentation in learning. Experimental results on 4 datasets and BERT and distilBERT classifiers show that SP-Defense improves ρ\rho by 14.6% and 13.9% and decreases the attack success rate of SP-Attack by 30.4% and 21.2% on two classifiers respectively, and decreases the attack success rate of existing attack methods that involve multiple-word perturbations.
The CTGAN model from the MIT LIDS Data to AI Lab generates high-fidelity synthetic tabular data by tackling challenges such as mixed data types, multimodal distributions, and imbalanced categorical features. It consistently outperforms other deep generative models and traditional baselines in machine learning efficacy on real-world datasets, while also providing a comprehensive open-source benchmarking framework.
1
Generative AI models provide a wide range of tools capable of performing complex tasks in a fraction of the time it would take a human. Among these, Large Language Models (LLMs) stand out for their ability to generate diverse texts, from literary narratives to specialized responses in different fields of knowledge. This paper explores the use of fine-tuned LLMs to identify physical descriptions of people, and subsequently create accurate representations of avatars using the SMPL-X model by inferring shape parameters. We demonstrate that LLMs can be trained to understand and manipulate the shape space of SMPL, allowing the control of 3D human shapes through natural language. This approach promises to improve human-machine interaction and opens new avenues for customization and simulation in virtual environments.
Phonons have long been thought to be incapable of explaining key phenomena in strange metals, including linear-in-\textit{T} Planckian resistivity from high to very low temperatures. We argue that these conclusions were based on static, perturbative approaches that overlooked essential time-dependent and nonperturbative electron-lattice physics. In fact ``phonons'' are not the best target for discussion, just like ``photons'' are not the best way to think about Maxwell's equations. Quantum optics connects photons and electromagnetism, as developed 60 years ago by Glauber and others. We have been developing the parallel world of quantum acoustics. Far from being only of academic interest, the new tools are rapidly exposing the secrets of the strange metals, revealing strong vibronic (vibration-electronic) interactions playing a crucial role forming polarons and charge density waves, linear-in-TT resistivity at the Planckian rate over thousands of degrees, resolution of the Drude peak infrared anomaly, and the absence of a T4T^4 low-temperature resistivity rise in 2D systems, and of a Mott-Ioffe-Regel resistivity saturation. We derive Planckian transport, polarons, CDWs, and pseudogaps from the Fröhlich model. The ``new physics'' has been hiding in this model all along, in the right parameter regime, if it is treated nonperturbatively. In the course of this work we have uncovered the generalization of Anderson localization to dynamic media: a universal Planckian diffusion emerges, a ``ghost'' of Anderson localization. Planckian diffusion is clearly defined and is more fundamental than the popular but elusive, model dependent concept of ``Planckian speed limit''.
Consider the following truncated Freud linear functional uz\mathbf{u}_z depending on a parameter zz, uz,p=0p(x)ezx4dx,z>0.\langle\mathbf{u}_z,p\rangle=\int_0^\infty p(x)e^{-zx^4}dx,\quad z>0. The aim of this work is to analyze the properties of the sequence of orthogonal polynomials (Pn)n0(P_n)_{n\geq 0} with respect to uz\mathbf{u}_z. Such a linear functional is semiclassical and, as a consequence, we get the system of nonlinear difference equations (Laguerre-Freud equations) that the coefficients of the three-term recurrence satisfy. The asymptotic behavior of such coefficients is given. On the other hand, the raising and lowering operators associated with such a linear functional are obtained, and thus a second-order linear differential equation of holonomic type that (Pn)n0(P_n)_{n\geq 0} satisfies is deduced. From this fact, an electrostatic interpretation of their zeros is given. Finally, some illustrative numerical tests concerning the behavior of the least and greatest zeros of these polynomials are presented.
Physical systems ranging from elastic bodies to kinematic linkages are defined on high-dimensional configuration spaces, yet their typical low-energy configurations are concentrated on much lower-dimensional subspaces. This work addresses the challenge of identifying such subspaces automatically: given as input an energy function for a high-dimensional system, we produce a low-dimensional map whose image parameterizes a diverse yet low-energy submanifold of configurations. The only additional input needed is a single seed configuration for the system to initialize our procedure; no dataset of trajectories is required. We represent subspaces as neural networks that map a low-dimensional latent vector to the full configuration space, and propose a training scheme to fit network parameters to any system of interest. This formulation is effective across a very general range of physical systems; our experiments demonstrate not only nonlinear and very low-dimensional elastic body and cloth subspaces, but also more general systems like colliding rigid bodies and linkages. We briefly explore applications built on this formulation, including manipulation, latent interpolation, and sampling.
We present a self-supervised method to learn dynamic 3D deformations of garments worn by parametric human bodies. State-of-the-art data-driven approaches to model 3D garment deformations are trained using supervised strategies that require large datasets, usually obtained by expensive physics-based simulation methods or professional multi-camera capture setups. In contrast, we propose a new training scheme that removes the need for ground-truth samples, enabling self-supervised training of dynamic 3D garment deformations. Our key contribution is to realize that physics-based deformation models, traditionally solved in a frame-by-frame basis by implicit integrators, can be recasted as an optimization problem. We leverage such optimization-based scheme to formulate a set of physics-based loss terms that can be used to train neural networks without precomputing ground-truth data. This allows us to learn models for interactive garments, including dynamic deformations and fine wrinkles, with two orders of magnitude speed up in training time compared to state-of-the-art supervised methods
We present SeamlessGAN, a method capable of automatically generating tileable texture maps from a single input exemplar. In contrast to most existing methods, focused solely on solving the synthesis problem, our work tackles both problems, synthesis and tileability, simultaneously. Our key idea is to realize that tiling a latent space within a generative network trained using adversarial expansion techniques produces outputs with continuity at the seam intersection that can be then be turned into tileable images by cropping the central area. Since not every value of the latent space is valid to produce high-quality outputs, we leverage the discriminator as a perceptual error metric capable of identifying artifact-free textures during a sampling process. Further, in contrast to previous work on deep texture synthesis, our model is designed and optimized to work with multi-layered texture representations, enabling textures composed of multiple maps such as albedo, normals, etc. We extensively test our design choices for the network architecture, loss function and sampling parameters. We show qualitatively and quantitatively that our approach outperforms previous methods and works for textures of different types.
This is an up-to-date introduction to, and overview of, marginal likelihood computation for model selection and hypothesis testing. Computing normalizing constants of probability models (or ratio of constants) is a fundamental issue in many applications in statistics, applied mathematics, signal processing and machine learning. This article provides a comprehensive study of the state-of-the-art of the topic. We highlight limitations, benefits, connections and differences among the different techniques. Problems and possible solutions with the use of improper priors are also described. Some of the most relevant methodologies are compared through theoretical comparisons and numerical experiments.
This article investigates how a uniform high frequency (HF) drive applied to each site of a weakly-coupled discrete nonlinear resonator array can modulate the onsite natural stiffness and damping and thereby facilitate the active tunability of the nonlinear response and the phonon dispersion relation externally. Starting from a canonical model of parametrically excited \textit{van der Pol-Duffing} chain of oscillators with nearest neighbor coupling, a systematic two-widely separated time scale expansion (\textit{Direct Partition of Motion}) has been employed, in the backdrop of Blekhman's perturbation scheme. This procedure eliminates the fast scale and yields the effective collective dynamics of the array with renormalized stiffness and damping, modified by the high-frequency drive. The resulting dispersion shift controls which normal modes enter the parametric resonance window, allowing highly selective activation of specific bulk modes through external HF tuning. The collective resonant response to the parametric excitation and mode-selection by the HF drive has been analyzed and validated by detailed numerical simulations. The results offer a straightforward, experimentally tractable route to active control of response and channelize energy through selective mode activation in MEMS/NEMS arrays and related resonator platforms.
We propose a CNN-based approach for 3D human body pose estimation from single RGB images that addresses the issue of limited generalizability of models trained solely on the starkly limited publicly available 3D pose data. Using only the existing 3D pose data and 2D pose data, we show state-of-the-art performance on established benchmarks through transfer of learned features, while also generalizing to in-the-wild scenes. We further introduce a new training set for human body pose estimation from monocular images of real humans that has the ground truth captured with a multi-camera marker-less motion capture system. It complements existing corpora with greater diversity in pose, human appearance, clothing, occlusion, and viewpoints, and enables an increased scope of augmentation. We also contribute a new benchmark that covers outdoor and indoor scenes, and demonstrate that our 3D pose dataset shows better in-the-wild performance than existing annotated data, which is further improved in conjunction with transfer learning from 2D pose data. All in all, we argue that the use of transfer learning of representations in tandem with algorithmic and data contributions is crucial for general 3D body pose estimation.
We introduce TexTile, a novel differentiable metric to quantify the degree upon which a texture image can be concatenated with itself without introducing repeating artifacts (i.e., the tileability). Existing methods for tileable texture synthesis focus on general texture quality, but lack explicit analysis of the intrinsic repeatability properties of a texture. In contrast, our TexTile metric effectively evaluates the tileable properties of a texture, opening the door to more informed synthesis and analysis of tileable textures. Under the hood, TexTile is formulated as a binary classifier carefully built from a large dataset of textures of different styles, semantics, regularities, and human this http URL to our method is a set of architectural modifications to baseline pre-train image classifiers to overcome their shortcomings at measuring tileability, along with a custom data augmentation and training regime aimed at increasing robustness and accuracy. We demonstrate that TexTile can be plugged into different state-of-the-art texture synthesis methods, including diffusion-based strategies, and generate tileable textures while keeping or even improving the overall texture quality. Furthermore, we show that TexTile can objectively evaluate any tileable texture synthesis method, whereas the current mix of existing metrics produces uncorrelated scores which heavily hinders progress in the field.
43
We show that work done by the non conservative forces along a stable limit cycle attractor of a dissipative dynamical system is always equal to zero. Thus, mechanical energy is preserved on average along periodic orbits. This balance between energy gain and energy loss along different phases of the self sustained oscillation is responsible for the existence of quantized orbits in such systems. Furthermore, we show that the instantaneous preservation of projected phase space areas along quantized orbits describes the neutral dynamics of the phase, allowing us to derive from this equation the Wilson Sommerfeld like quantization condition. We apply our general results to near Hamiltonian systems, identifying the fixed points of Krylov Bogoliubov radial equation governing the dynamics of the limit cycles with the zeros of the Melnikov function. Moreover, we relate the instantaneous preservation of the phase space area along the quantized orbits to the second Krylov Bogoliubov equation describing the dynamics of the phase. We test the two quantization conditions in the context of hydrodynamic quantum analogs, where a megastable spectra of quantized orbits have recently been discovered. Specifically, we use a generalized pilot wave model for a walking droplet confined in a harmonic potential, and find a countably infinite set of nested limit cycle attractors representing a classical analog of quantized orbits. We compute the energy spectrum and the eigenfunctions of this self excited system.
We investigate the escape dynamics in an open circular billiard under the influence of a uniform gravitational field. The system properties are investigated as a function of the particle total energy and the size of two symmetrically placed holes in the boundary. Using a suite of quantitative tools including escape basins, basin entropy (SbS_b), mean escape time (τˉ\bar{\tau}), and survival probability (P(n)P(n)), we characterize a system that transitions from a fully chaotic, hyperbolic regime at low energies to a non-hyperbolic, mixed phase space at higher energies. Our results demonstrate that this transition is marked by the emergence of Kolmogorov-Arnold-Moser (KAM) islands. We show that both the basin entropy and the mean escape time are sensitive to this transition, with the former peaking and the latter increasing sharply as the sticky KAM islands appear. The survival probability analysis confirms this dynamical picture, shifting from a pure exponential decay in the hyperbolic regime to a power-law-like decay with a saturation plateau in the mixed regime, which directly quantifies the measure of trapped orbits. In the high-energy limit, the system dynamics approaches an integrable case, leading to a corresponding decrease in complexity as measured by both SbS_b and τˉ\bar{\tau}.
This paper studies the capacity of massive random-access cellular networks, modeled as a MIMO fading channel with an infinite number of interfering cells. To characterize the symmetric sum rate of the network, a random-coding argument is invoked together with the assumption that in all cells users draw their codebooks according to the same distribution. This can be viewed as a generalization of the assumption of Gaussian codebooks, often encountered in the literature. The network is further assumed to be noncoherent: the transmitters and receivers are cognizant of the statistics of the fading coefficients, but are ignorant of their realizations. Finally, it is assumed that the users access the network at random. For the considered channel model, rigorous bounds on the capacity are derived. The behavior of these bounds depends critically on the path loss from signals transmitted in interfering cells to the intended cell. In particular, if the fading coefficients of the interferers (ordered according to their distance to the receiver) decay exponentially or more slowly, then the capacity is bounded in the transmit power. This confirms that the saturation regime in interference-limited networks -- observed by Lozano, Heath, and Andrews ("Fundamental limits of cooperation", IEEE Trans. Inf. Theory, Sept. 2013) -- cannot be avoided by random user activity or by using channel inputs beyond the scale family. In contrast, if the fading coefficients decay faster than double-exponentially, then the capacity is unbounded in the transmit power. Proving an unbounded capacity is nontrivial even if the number of interfering cells is finite, since the condition that the users' codebooks follow the same distribution prevents interference-avoiding strategies such as time- or frequency-division multiple access. We obtain this result by using bursty signaling together with treating interference as noise.
Generalizing deepfake detection to unseen manipulations remains a key challenge. A recent approach to tackle this issue is to train a network with pristine face images that have been manipulated with hand-crafted artifacts to extract more generalizable clues. While effective for static images, extending this to the video domain is an open issue. Existing methods model temporal artifacts as frame-to-frame instabilities, overlooking a key vulnerability: the violation of natural motion dependencies between different facial regions. In this paper, we propose a synthetic video generation method that creates training data with subtle kinematic inconsistencies. We train an autoencoder to decompose facial landmark configurations into motion bases. By manipulating these bases, we selectively break the natural correlations in facial movements and introduce these artifacts into pristine videos via face morphing. A network trained on our data learns to spot these sophisticated biomechanical flaws, achieving state-of-the-art generalization results on several popular benchmarks.
A classical particle in a harmonic potential gives rise to a continuous energy spectra, whereas the corresponding quantum particle exhibits countably infinite quantized energy levels. In recent years, classical non-Markovian wave-particle entities that materialize as walking droplets have been shown to exhibit various hydrodynamic quantum analogs, including quantization in a harmonic potential by displaying few coexisting limit cycle orbits. By considering a truncated-memory stroboscopic pilot-wave model of the system in the low dissipation regime, we obtain a classical harmonic oscillator perturbed by oscillatory non-conservative forces that displays countably infinite coexisting limit-cycle states, also known as \emph{megastability}. Using averaging techniques in the low-memory regime, we derive analytical approximations of the orbital radii, orbital frequency and Lyapunov energy function of this megastable spectrum, and further show average energy conservation along these quantized states. Our formalism extends to a general class of self-excited oscillators and can be used to construct megastable spectrum with different energy-frequency relations.
Reconstructing two-hand interactions from a single image is a challenging problem due to ambiguities that stem from projective geometry and heavy occlusions. Existing methods are designed to estimate only a single pose, despite the fact that there exist other valid reconstructions that fit the image evidence equally well. In this paper we propose to address this issue by explicitly modeling the distribution of plausible reconstructions in a conditional normalizing flow framework. This allows us to directly supervise the posterior distribution through a novel determinant magnitude regularization, which is key to varied 3D hand pose samples that project well into the input image. We also demonstrate that metrics commonly used to assess reconstruction quality are insufficient to evaluate pose predictions under such severe ambiguity. To address this, we release the first dataset with multiple plausible annotations per image called MultiHands. The additional annotations enable us to evaluate the estimated distribution using the maximum mean discrepancy metric. Through this, we demonstrate the quality of our probabilistic reconstruction and show that explicit ambiguity modeling is better-suited for this challenging problem.
Researchers from UT Dallas developed AutoConcierge, a neuro-symbolic conversational agent that integrates Large Language Models with Answer Set Programming to achieve deep understanding and provide reliably correct, explainable responses in domain-specific dialogs. The system demonstrated superior proactivity and consistency compared to purely LLM-based systems in restaurant recommendation tasks, while maintaining practical response times.
Graph theory is now becoming a standard tool in system-level neuroscience. However, endowing observed brain anatomy and dynamics with a complex network structure does not entail that the brain actually works as a network. Asking whether the brain behaves as a network means asking whether network properties count. From the viewpoint of neurophysiology and, possibly, of brain physics, the most substantial issues a network structure may be instrumental in addressing relate to the influence of network properties on brain dynamics and to whether these properties ultimately explain some aspects of brain function. Here, we address the dynamical implications of complex network, examining which aspects and scales of brain activity may be understood to genuinely behave as a network. To do so, we first define the meaning of networkness, and analyse some of its implications. We then examine ways in which brain anatomy and dynamics can be endowed with a network structure and discuss possible ways in which network structure may be shown to represent a genuine organisational principle of brain activity, rather than just a convenient description of its anatomy and dynamics.
There are no more papers matching your filters at the moment.