San Diego Supercomputer CenterUC San Diego
M+ extends the MemoryLLM architecture, enabling Large Language Models to retain and recall information over sequence lengths exceeding 160,000 tokens, a substantial improvement over MemoryLLM's previous 20,000 token limit. This is achieved through a scalable long-term memory mechanism and a co-trained retriever that efficiently retrieves relevant hidden states, while maintaining competitive GPU memory usage.
119
Researchers from UCSD, UCLA, and Amazon developed MEMORYLLM, a Large Language Model architecture designed for continuous self-updatability by integrating a fixed-size memory pool directly into the transformer's latent space. The model efficiently absorbs new knowledge, retains previously learned information, and maintains performance integrity over extensive update cycles, demonstrating superior capabilities in model editing and long-context understanding while remaining robust after nearly a million updates.
119
Text-to-audio systems, while increasingly performant, are slow at inference time, thus making their latency unpractical for many creative applications. We present Adversarial Relativistic-Contrastive (ARC) post-training, the first adversarial acceleration algorithm for diffusion/flow models not based on distillation. While past adversarial post-training methods have struggled to compare against their expensive distillation counterparts, ARC post-training is a simple procedure that (1) extends a recent relativistic adversarial formulation to diffusion/flow post-training and (2) combines it with a novel contrastive discriminator objective to encourage better prompt adherence. We pair ARC post-training with a number optimizations to Stable Audio Open and build a model capable of generating \approx12s of 44.1kHz stereo audio in \approx75ms on an H100, and \approx7s on a mobile edge-device, the fastest text-to-audio model to our knowledge.
3,465
Researchers at University College Cork and Bar-Ilan University quantitatively mapped the efficiency and timescales of population inversion formation in relativistic plasmas through nonresonant interactions with Alfv´en waves. This study, crucial for synchrotron maser emission (SME) in Fast Radio Bursts (FRBs), finds that up to 30% of particle energy can be funneled into the inversion, particularly in highly magnetized plasmas, supporting the generation of FRB signals at GHz frequencies.
63
Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge. Sound examples can be found at this https URL
Researchers from UC San Diego and collaborators introduced WIKIDYK, a real-world, expert-curated benchmark for evaluating Large Language Models' (LLMs) ability to acquire and retain new facts. The study found that Bidirectional Language Models (BiLMs) outperform Causal Language Models (CLMs) in knowledge memorization, leading to a proposed modular ensemble framework for efficient knowledge injection.
28 Apr 2022
Shape restrictions have played a central role in economics as both testable implications of theory and sufficient conditions for obtaining informative counterfactual predictions. In this paper we provide a general procedure for inference under shape restrictions in identified and partially identified models defined by conditional moment restrictions. Our test statistics and proposed inference methods are based on the minimum of the generalized method of moments (GMM) objective function with and without shape restrictions. Uniformly valid critical values are obtained through a bootstrap procedure that approximates a subset of the true local parameter space. In an empirical analysis of the effect of childbearing on female labor supply, we show that employing shape restrictions in linear instrumental variables (IV) models can lead to shorter confidence regions for both local and average treatment effects. Other applications we discuss include inference for the variability of quantile IV treatment effects and for bounds on average equivalent variation in a demand model with general heterogeneity. We find in Monte Carlo examples that the critical values are conservatively accurate and that tests about objects of interest have good power relative to unrestricted GMM.
This research explains why VGG networks excel at neural style transfer while modern architectures struggle, attributing the issue to the low-entropy activation distributions caused by residual connections. The proposed solution, Stylization With Activation smoothinG (SWAG), applies a softmax transformation to feature activations during loss calculation, enabling modern networks like ResNet to achieve visual quality that matches or exceeds VGG’s performance in style transfer tasks.
A performance analysis on AMD's MI300A APU reveals that GPU implementations of the PERMANOVA algorithm achieve 6x faster execution than CPU versions for large-scale microbiome data analysis, with GPU cores demonstrating 3.0 TB/s memory throughput compared to CPU's 0.2 TB/s through STREAM benchmark testing.
This research from Kim, Diagne, and Krstić at UC San Diego develops a robust control strategy that guarantees safety for high relative degree systems operating in the presence of unknown moving obstacles. The method combines Robust Control Barrier Functions (RCBFs) with CBF backstepping, introducing a novel smooth formulation (sRCBF) to overcome non-smoothness issues, and successfully avoids collisions in simulations where standard methods fail.
We investigate applying 3D deep convolutional neural networks as fast surrogate models of the formation and feedback effects of primordial stars in hydrodynamic cosmological simulations of the first galaxies. Here, we present the surrogate model to predict localized primordial star formation; the feedback model will be presented in a subsequent paper. The star formation prediction model consists of two sub-models: the first is a 3D volume classifier that predicts which (10 comoving kpc)3^3 volumes will host star formation, followed by a 3D Inception-based U-net voxel segmentation model that predicts which voxels will form primordial stars. We find that the combined model predicts primordial star forming volumes with high skill, with F1>0.995F_1 >0.995 and true skill score >0.994>0.994. The star formation is localized within the volume to 53\lesssim5^3~voxels (1.6\sim1.6~comoving kpc3^3) with F1>0.399F_1>0.399 and true skill score >0.857>0.857. Applied to simulations with low spatial resolution, the model predicts star forming regions in the same locations and at similar redshifts as sites in resolved full-physics simulations that explicitly model primordial star formation and feedback. When applied to simulations with lower mass resolution, we find that the model predicts star forming regions at later redshift due to delayed structure formation resulting from lower mass resolution. Our model predicts primordial star formation without halo finding, so will be useful in spatially under-resolved simulations that cannot resolve primordial star forming halos. To our knowledge, this is the first model that can predict primordial star forming regions that match highly-resolved cosmological simulations.
As demand for AI literacy and data science education grows, there is a critical need for infrastructure that bridges the gap between research data, computational resources, and educational experiences. To address this gap, we developed a first-of-its-kind Education Hub within the National Data Platform. This hub enables seamless connections between collaborative research workspaces, classroom environments, and data challenge settings. Early use cases demonstrate the effectiveness of the platform in supporting complex and resource-intensive educational activities. Ongoing efforts aim to enhance the user experience and expand adoption by educators and learners alike.
Generative Adversarial Networks (GANs) are a widely-used tool for generative modeling of complex data. Despite their empirical success, the training of GANs is not fully understood due to the min-max optimization of the generator and discriminator. This paper analyzes these joint dynamics when the true samples, as well as the generated samples, are discrete, finite sets, and the discriminator is kernel-based. A simple yet expressive framework for analyzing training called the Isolated Points Model\textit{Isolated Points Model} is introduced. In the proposed model, the distance between true samples greatly exceeds the kernel width, so each generated point is influenced by at most one true point. Our model enables precise characterization of the conditions for convergence, both to good and bad minima. In particular, the analysis explains two common failure modes: (i) an approximate mode collapse and (ii) divergence. Numerical simulations are provided that predictably replicate these behaviors.
Domain adaptation (DA) is a technique that transfers predictive models trained on a labeled source domain to an unlabeled target domain, with the core difficulty of resolving distributional shift between domains. Currently, most popular DA algorithms are based on distributional matching (DM). However in practice, realistic domain shifts (RDS) may violate their basic assumptions and as a result these methods will fail. In this paper, in order to devise robust DA algorithms, we first systematically analyze the limitations of DM based methods, and then build new benchmarks with more realistic domain shifts to evaluate the well-accepted DM methods. We further propose InstaPBM, a novel Instance-based Predictive Behavior Matching method for robust DA. Extensive experiments on both conventional and RDS benchmarks demonstrate both the limitations of DM methods and the efficacy of InstaPBM: Compared with the best baselines, InstaPBM improves the classification accuracy respectively by 4.5%4.5\%, 3.9%3.9\% on Digits5, VisDA2017, and 2.2%2.2\%, 2.9%2.9\%, 3.6%3.6\% on DomainNet-LDS, DomainNet-ILDS, ID-TwO. We hope our intuitive yet effective method will serve as a useful new direction and increase the robustness of DA in real scenarios. Code will be available at anonymous link: this https URL
The origin of supermassive black holes (SMBHs) that inhabit the centers of massive galaxies is largely unconstrained. Remnants from supermassive stars (SMSs) with masses around 10,000 solar masses provide the ideal seed candidates, known as direct collapse black holes. However, their very existence and formation environment in the early Universe are still under debate, with their supposed rarity further exacerbating the problem of modeling their ab-initio formation. SMS models have shown that rapid collapse, with an infall rate above a critical value, in metal-free haloes is a requirement for the formation of a proto-stellar core which will then form an SMS. Using a radiation hydrodynamics simulation of early galaxy formation, we show the natural emergence of metal-free haloes both massive enough, and with sufficiently high infall rates, to form an SMS. We find that haloes that are exposed to both a Lyman-Werner intensity of J_LW ~ 3 J_21 and that undergo at least one period of rapid growth early in their evolution are ideal cradles for SMS formation. This rapid growth induces substantial dynamical heating, amplifying the existing Lyman-Werner suppression originating from a group of young galaxies 20 kiloparsecs away. Our results strongly indicate that structure formation dynamics, rather than a critical Lyman-Werner (LW) flux, may be the main driver of massive black hole formation in the early Universe. We find that massive black hole seeds may be much more common in overdense regions of the early Universe than previously considered with a comoving number density up to 10^-3 Mpc^-3.
Modern large-scale physics experiments create datasets with sizes and streaming rates that can exceed those from industry leaders such as Google Cloud and Netflix. Fully processing these datasets requires both sufficient compute power and efficient workflows. Recent advances in Machine Learning (ML) and Artificial Intelligence (AI) can either improve or replace existing domain-specific algorithms to increase workflow efficiency. Not only can these algorithms improve the physics performance of current algorithms, but they can often be executed more quickly, especially when run on coprocessors such as GPUs or FPGAs. In the winter of 2023, MIT hosted the Accelerating Physics with ML at MIT workshop, which brought together researchers from gravitational-wave physics, multi-messenger astrophysics, and particle physics to discuss and share current efforts to integrate ML tools into their workflows. The following white paper highlights examples of algorithms and computing frameworks discussed during this workshop and summarizes the expected computing needs for the immediate future of the involved fields.
Optimization problems pervade essentially every scientific discipline and industry. Many such problems require finding a solution that maximizes the number of constraints satisfied. Often, these problems are particularly difficult to solve because they belong to the NP-hard class, namely algorithms that always find a solution in polynomial time are not known. Over the past decades, research has focused on developing heuristic approaches that attempt to find an approximation to the solution. However, despite numerous research efforts, in many cases even approximations to the optimal solution are hard to find, as the computational time for further refining a candidate solution grows exponentially with input size. Here, we show a non-combinatorial approach to hard optimization problems that achieves an exponential speed-up and finds better approximations than the current state-of-the-art. First, we map the optimization problem into a boolean circuit made of specially designed, self-organizing logic gates, which can be built with (non-quantum) electronic components; the equilibrium points of the circuit represent the approximation to the problem at hand. Then, we solve its associated non-linear ordinary differential equations numerically, towards the equilibrium points. We demonstrate this exponential gain by comparing a sequential MatLab implementation of our solver with the winners of the 2016 Max-SAT competition on a variety of hard optimization instances. We show empirical evidence that our solver scales linearly with the size of the problem, both in time and memory, and argue that this property derives from the collective behavior of the simulated physical circuit. Our approach can be applied to other types of optimization problems and the results presented here have far-reaching consequences in many fields.
This report presents the takeaways of the inaugural Workshop on Generative AI and Law (GenLaw), held in July 2023. A cross-disciplinary group of practitioners and scholars from computer science and law convened to discuss the technical, doctrinal, and policy challenges presented by law for Generative AI, and by Generative AI for law, with an emphasis on U.S. law in particular. We begin the report with a high-level statement about why Generative AI is both immensely significant and immensely challenging for law. To meet these challenges, we conclude that there is an essential need for 1) a shared knowledge base that provides a common conceptual language for experts across disciplines; 2) clarification of the distinctive technical capabilities of generative-AI systems, as compared and contrasted to other computer and AI systems; 3) a logical taxonomy of the legal issues these systems raise; and, 4) a concrete research agenda to promote collaboration and knowledge-sharing on emerging issues at the intersection of Generative AI and law. In this report, we synthesize the key takeaways from the GenLaw workshop that begin to address these needs. All of the listed authors contributed to the workshop upon which this report is based, but they and their organizations do not necessarily endorse all of the specific claims in this report.
We study the problem of PAC learning homogeneous halfspaces in the presence of Tsybakov noise. In the Tsybakov noise model, the label of every sample is independently flipped with an adversarially controlled probability that can be arbitrarily close to 1/21/2 for a fraction of the samples. {\em We give the first polynomial-time algorithm for this fundamental learning problem.} Our algorithm learns the true halfspace within any desired accuracy ϵ\epsilon and succeeds under a broad family of well-behaved distributions including log-concave distributions. Prior to our work, the only previous algorithm for this problem required quasi-polynomial runtime in 1/ϵ1/\epsilon. Our algorithm employs a recently developed reduction \cite{DKTZ20b} from learning to certifying the non-optimality of a candidate halfspace. This prior work developed a quasi-polynomial time certificate algorithm based on polynomial regression. {\em The main technical contribution of the current paper is the first polynomial-time certificate algorithm.} Starting from a non-trivial warm-start, our algorithm performs a novel "win-win" iterative process which, at each step, either finds a valid certificate or improves the angle between the current halfspace and the true one. Our warm-start algorithm for isotropic log-concave distributions involves a number of analytic tools that may be of broader interest. These include a new efficient method for reweighting the distribution in order to recenter it and a novel characterization of the spectrum of the degree-22 Chow parameters.
We present an approach to designing 3D Iterated Function Systems (IFS) within the Unity Editor and rendered to VR in real-time. Objects are modeled as a hierarchical tree of primitive shapes and operators, editable using a graphical user interface allowing artists to develop psychedelic scenes with little to no coding knowledge, and is easily extensible for more advanced users to add their own primitive shapes and operators.
There are no more papers matching your filters at the moment.