Southern Methodist University
This systematic literature review examines the application of Large Language Models (LLMs) to time series forecasting and anomaly detection. It consolidates current methodologies, identifies capabilities like multi-modal data processing and natural language explanations, and outlines challenges such as data requirements, model hallucinations, and computational demands.
Quantitatively characterizing the spatial organization of cells and their interaction is essential for understanding cancer progression and immune response. Recent advances in machine intelligence have enabled large-scale segmentation and classification of cell nuclei from digitized histopathology slides, generating massive point pattern and marked point pattern datasets. However, accessible tools for quantitative analysis of such complex cellular spatial organization remain limited. In this paper, we first review 27 traditional spatial summary statistics, areal indices, and topological features applicable to point pattern data. Then, we introduce SASHIMI (Spatial Analysis for Segmented Histopathology Images using Machine Intelligence), a browser-based tool for real-time spatial analysis of artificial intelligence (AI)-segmented histopathology images. SASHIMI computes a comprehensive suite of mathematically grounded descriptors, including spatial statistics, proximity-based measures, grid-level similarity indices, spatial autocorrelation measures, and topological descriptors, to quantify cellular abundance and cell-cell interaction. Applied to two cancer datasets, oral potentially malignant disorders (OPMD) and non-small-cell lung cancer (NSCLC), SASHIMI identified multiple spatial features significantly associated with patient survival outcomes. SASHIMI provides an accessible and reproducible platform for single-cell-level spatial profiling of tumor morphological architecture, offering a robust framework for quantitative exploration of tissue organization across cancer types.
Since its introduction, softmax attention has become the backbone of modern transformer architectures due to its expressiveness and scalability across a wide range of tasks. However, the main drawback of softmax attention is the quadratic memory requirement and computational complexity with respect to the sequence length. By replacing the softmax nonlinearity, linear attention and similar methods have been introduced to avoid the quadratic bottleneck of softmax attention. Despite these linear forms of attention being derived from the original softmax formulation, they typically lag in terms of downstream accuracy. While strong intuition of the softmax nonlinearity on the query and key inner product suggests that it has desirable properties compared to other nonlinearities, the question of why this discrepancy exists still remains unanswered. This work demonstrates that linear attention is an approximation of softmax attention by deriving the recurrent form of softmax attention. Using this form, each part of softmax attention can be described in the language of recurrent neural networks (RNNs). Describing softmax attention as an RNN allows for the ablation of the components of softmax attention to understand the importance of each part and how they interact. In this way, our work helps explain why softmax attention is more expressive than its counterparts.
3
The stock market's ascent typically mirrors the flourishing state of the economy, whereas its decline is often an indicator of an economic downturn. Therefore, for a long time, significant correlation elements for predicting trends in financial stock markets have been widely discussed, and people are becoming increasingly interested in the task of financial text mining. The inherent instability of stock prices makes them acutely responsive to fluctuations within the financial markets. In this article, we use deep learning networks, based on the history of stock prices and articles of financial, business, technical news that introduce market information to predict stock prices. We illustrate the enhancement of predictive precision by integrating weighted news categories into the forecasting model. We developed a pre-trained NLP model known as FinBERT, designed to discern the sentiments within financial texts. Subsequently, we advanced this model by incorporating the sophisticated Long Short Term Memory (LSTM) architecture, thus constructing the innovative FinBERT-LSTM model. This model utilizes news categories related to the stock market structure hierarchy, namely market, industry, and stock related news categories, combined with the stock market's stock price situation in the previous week for prediction. We selected NASDAQ-100 index stock data and trained the model on Benzinga news articles, and utilized Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Accuracy as the key metrics for the assessment and comparative analysis of the model's performance. The results indicate that FinBERT-LSTM performs the best, followed by LSTM, and DNN model ranks third in terms of effectiveness.
Recent observations of the cosmic microwave background (CMB) and baryon acoustic oscillations (BAO) show some tension with a Λ\LambdaCDM cosmology. For one, the cosmological parameters determined by the CMB are at odds with the expansion history determined by latest BAO measurements. In addition, the combined data has placed uncomfortably strong constraints on neutrino mass. Both effects can be interpreted as negative neutrino mass, one describing the change to the expansion history and the other one describing enhanced lensing. In this paper, we show the current tensions can be solved with a single change either to the lensing of the CMB or the expansion of the universe. We show additional lensing could arise from a variety of models with new light fields. However, these models rarely give the same signal in temperature and polarization, giving a concrete test of the scenario. Alternatively, dark sector models can explain the changes to the expansion by changing the evolution of the matter density. These models introduce new forces, giving rise to long range signals in the three-point statistics of galaxies. We discuss a range of other examples which all illustrate the pattern that additional signals should appear if these tensions are explained by beyond the Standard Model physics.
Understanding whether cosmic acceleration arises from a cosmological constant or a dynamical component is a central goal of cosmology, and the Dark Energy Spectroscopic Instrument (DESI) enables stringent tests with high-precision distance measurements. We analyze baryon acoustic oscillation (BAO) measurements from DESI Data Release 1 (DR1) and Data Release 2 (DR2), combined with Type Ia supernovae and a cosmic microwave background (CMB) distance prior. With the larger statistical power and wider redshift coverage of DR2, the preference for dynamical dark energy does not diminish relative to DR1. Using both a shape-function reconstruction and non-parametric approaches with a Horndeski-motivated correlation prior, we find that the dark-energy equation of state w(z)w(z) varies with redshift. BAO data alone yield modest constraints, but in combination with independent supernova compilations and the CMB prior they strengthen the evidence for dynamics. Bayesian model comparison shows moderate support for departures from Λ\LambdaCDM when multiple degrees of freedom in w(z)w(z) are allowed, corresponding to 3σ\approx3\sigma tension with Λ\LambdaCDM (and higher for some data sets). Despite methodological differences, our results are consistent with companion DESI papers, underscoring the complementarity of approaches. Possible systematics remain under study; forthcoming DESI, \emph{Euclid}, and next-generation CMB data will provide decisive tests.
Prediction of stock price movements presents a formidable challenge in financial analytics due to the inherent volatility, non-stationarity, and nonlinear characteristics of market data. This paper introduces SPH-Net (Stock Price Prediction Hybrid Neural Network), an innovative deep learning framework designed to enhance the accuracy of time series forecasting in financial markets. The proposed architecture employs a novel co-attention mechanism that initially processes temporal patterns through a Vision Transformer, followed by refined feature extraction via an attention mechanism, thereby capturing both global and local dependencies in market data. To rigorously evaluate the model's performance, we conduct comprehensive experiments on eight diverse stock datasets: AMD, Ebay, Facebook, FirstService Corp, Tesla, Google, Mondi ADR, and Matador Resources. Each dataset is standardized using six fundamental market indicators: Open, High, Low, Close, Adjusted Close, and Volume, representing a complete set of features for comprehensive market analysis. Experimental results demonstrate that SPH-Net consistently outperforms existing stock prediction models across all evaluation metrics. The model's superior performance stems from its ability to effectively capture complex temporal patterns while maintaining robustness against market noise. By significantly improving prediction accuracy in financial time series analysis, SPH-Net provides valuable decision-support capabilities for investors and financial analysts, potentially enabling more informed investment strategies and risk assessment in volatile market conditions.
We present spectroscopic data of strong lenses and their source galaxies using the Keck Near-Infrared Echellette Spectrometer (NIRES) and the Dark Energy Spectroscopic Instrument (DESI), providing redshifts necessary for nearly all strong-lensing applications with these systems, especially the extraction of physical parameters from lensing modeling. These strong lenses were found in the DESI Legacy Imaging Surveys using Residual Neural Networks (ResNet) and followed up by our Hubble Space Telescope program, with all systems displaying unambiguous lensed arcs. With NIRES, we target eight lensed sources at redshifts difficult to measure in the optical range and determine the source redshifts for six, between zsz_s = 1.675 and 3.332. DESI observed one of the remaining source redshifts, as well as an additional source redshift within the six systems. The two systems with non-detections by NIRES were observed for a considerably shorter 600s at high airmass. Combining NIRES infrared spectroscopy with optical spectroscopy from our DESI Strong Lensing Secondary Target Program, these results provide the complete lens and source redshifts for six systems, a resource for refining automated strong lens searches in future deep- and wide-field imaging surveys and addressing a range of questions in astrophysics and cosmology.
Natural Language Processing (NLP) offers new avenues for personality assessment by leveraging rich, open-ended text, moving beyond traditional questionnaires. In this study, we address the challenge of modeling long narrative interview where each exceeds 2000 tokens so as to predict Five-Factor Model (FFM) personality traits. We propose a two-step approach: first, we extract contextual embeddings using sliding-window fine-tuning of pretrained language models; then, we apply Recurrent Neural Networks (RNNs) with attention mechanisms to integrate long-range dependencies and enhance interpretability. This hybrid method effectively bridges the strengths of pretrained transformers and sequence modeling to handle long-context data. Through ablation studies and comparisons with state-of-the-art long-context models such as LLaMA and Longformer, we demonstrate improvements in prediction accuracy, efficiency, and interpretability. Our results highlight the potential of combining language-based features with long-context modeling to advance personality assessment from life narratives.
The baryon acoustic oscillation (BAO) analysis from the first year of data from the Dark Energy Spectroscopic Instrument (DESI), when combined with data from the cosmic microwave background (CMB), has placed an upper-limit on the sum of neutrino masses, \sum m_\nu < 70 meV (95%). In addition to excluding the minimum sum associated with the inverted hierarchy, the posterior is peaked at mν=0\sum m_\nu = 0 and is close to excluding even the minumum sum, 58 meV at 2σ\sigma. In this paper, we explore the implications of this data for cosmology and particle physics. The sum of neutrino mass is determined in cosmology from the suppression of clustering in the late universe. Allowing the clustering to be enhanced, we extended the DESI analysis to \sum m_\nu < 0 and find mν=160±90\sum m_\nu = - 160 \pm 90 meV (68%), and that the suppression of power from the minimum sum of neutrino masses is excluded at 99% confidence. We show this preference for negative masses makes it challenging to explain the result by a shift of cosmic parameters, such as the optical depth or matter density. We then show how a result of mν=0\sum m_\nu =0 could arise from new physics in the neutrino sector, including decay, cooling, and/or time-dependent masses. These models are consistent with current observations but imply new physics that is accessible in a wide range of experiments. In addition, we discuss how an apparent signal with \sum m_\nu < 0 can arise from new long range forces in the dark sector or from a primordial trispectrum that resembles the signal of CMB lensing.
Mammalian muscle progenitor cells exhibit a light-dependent magnetic compass, driven by electron spin dynamics, which guides their migration during tissue regeneration. This interdisciplinary work from institutions including Harvard Medical School, MIT, and University of Pittsburgh identifies a quantum-based mechanism for cellular motility, challenging the belief that internal cells are immune to weak magnetic fields and ambient light.
We present the Dark Energy Spectroscopic Instrument (DESI) Strong Lensing Secondary Target Program. This is a spectroscopic follow-up program for strong gravitational lens candidates found in the DESI Legacy Imaging Surveys footprint. Spectroscopic redshifts for the lenses and lensed source are crucial for lens modeling to obtain physical parameters. The spectroscopic catalog in this paper consists of 73 candidate systems from the DESI Early Data Release (EDR). We have confirmed 20 strong lensing systems and determined four to not be lenses. For the remaining systems, more spectroscopic data from ongoing and future observations will be presented in future publications. We discuss the implications of our results for lens searches with neural networks in existing and future imaging surveys as well as for lens modeling. This Strong Lensing Secondary Target Program is part of the DESI Strong Lens Foundry project, and this is Paper II of a series on this project.
We present constraints on low mass dark matter-electron scattering and absorption interactions using a SuperCDMS high-voltage eV-resolution (HVeV) detector. Data were taken underground in the NEXUS facility located at Fermilab with an overburden of 225 meters of water equivalent. The experiment benefits from the minimizing of luminescence from the printed circuit boards in the detector holder used in all previous HVeV studies. A blind analysis of 6.1gdays6.1\,\mathrm{g\cdot days} of exposure produces exclusion limits for dark matter-electron scattering cross-sections for masses as low as 1MeV/c21\,\mathrm{MeV}/c^2, as well as on the photon-dark photon mixing parameter and the coupling constant between axion-like particles and electrons for particles with masses >1.2eV/c2>1.2\,\mathrm{eV}/c^2 probed via absorption processes.
The bilevel variational inequality (BVI) problem is a general model that captures various optimization problems, including VI-constrained optimization and equilibrium problems with equilibrium constraints (EPECs). This paper introduces a first-order method for smooth or nonsmooth BVI with stochastic monotone operators at inner and outer levels. Our novel method, called Regularized Operator Extrapolation (R-OpEx)(\texttt{R-OpEx}), is a single-loop algorithm that combines Tikhonov's regularization with operator extrapolation. This method needs only one operator evaluation for each operator per iteration and tracks one sequence of iterates. We show that R-OpEx\texttt{R-OpEx} gives O(ϵ4)\mathcal{O}(\epsilon^{-4}) complexity in nonsmooth stochastic monotone BVI, where ϵ\epsilon is the error in the inner and outer levels. Using a mini-batching scheme, we improve the outer level complexity to O(ϵ2)\mathcal{O}(\epsilon^{-2}) while maintaining the O(ϵ4)\mathcal{O}(\epsilon^{-4}) complexity in the inner level when the inner level is smooth and stochastic. Moreover, if the inner level is smooth and deterministic, we show complexity of O(ϵ2)\mathcal{O}(\epsilon^{-2}). Finally, in case the outer level is strongly monotone, we improve to O(ϵ4/5)\mathcal{O}(\epsilon^{-4/5}) for general BVI and O(ϵ2/3)\mathcal{O}(\epsilon^{-2/3}) when the inner level is smooth and deterministic. To our knowledge, this is the first work that investigates nonsmooth stochastic BVI with the best-known convergence guarantees. We verify our theoretical results with numerical experiments.
This research introduces Procedural Artificial Narrative using Generative AI (PANGeA), a structured approach for leveraging large language models (LLMs), guided by a game designer's high-level criteria, to generate narrative content for turn-based role-playing video games (RPGs). Distinct from prior applications of LLMs used for video game design, PANGeA innovates by not only generating game level data (which includes, but is not limited to, setting, key items, and non-playable characters (NPCs)), but by also fostering dynamic, free-form interactions between the player and the environment that align with the procedural game narrative. The NPCs generated by PANGeA are personality-biased and express traits from the Big 5 Personality Model in their generated responses. PANGeA addresses challenges behind ingesting free-form text input, which can prompt LLM responses beyond the scope of the game narrative. A novel validation system that uses the LLM's intelligence evaluates text input and aligns generated responses with the unfolding narrative. Making these interactions possible, PANGeA is supported by a server that hosts a custom memory system that supplies context for augmenting generated responses thus aligning them with the procedural narrative. For its broad application, the server has a REST interface enabling any game engine to integrate directly with PANGeA, as well as an LLM interface adaptable with local or private LLMs. PANGeA's ability to foster dynamic narrative generation by aligning responses with the procedural narrative is demonstrated through an empirical study and ablation test of two versions of a demo game. These are, a custom, browser-based GPT and a Unity demo. As the results show, PANGeA holds potential to assist game designers in using LLMs to generate narrative-consistent content even when provided varied and unpredictable, free-form text input.
25 Jun 2025
Solving high-dimensional parabolic partial differential equations (PDEs) with deep learning methods is often computationally and memory intensive, primarily due to the need for automatic differentiation (AD) to compute large Hessian matrices in the PDE. In this work, we propose a deep random difference method (DRDM) that addresses these issues by approximating the convection-diffusion operator using only first-order differences and the solution by deep neural networks, thus, avoiding explicit Hessian computation. When incorporated into a Galerkin framework, the DRDM eliminates the need for pointwise evaluation of expectations, resulting in efficient implementation. We further extend the approach to Hamilton-Jacobi-Bellman (HJB) equations. Notably, the DRDM recovers existing martingale deep learning methods for PDEs (Cai et al., 2024, arXiv:2405.03169), without using the tools of stochastic calculus. The proposed method offers two main advantages: it removes the dependence on AD for PDE derivatives and enables parallel computation of the loss function in both time and space. We provide rigorous error estimates for the DRDM in the linear case, which shows a first order accuracy in Δt\Delta t used in the sampling of the paths by the Euler-Maruyama scheme. Numerical experiments demonstrate that the method can efficiently and accurately solve quasilinear parabolic PDEs and HJB equations in dimensions up to 10410^4 and 10510^5, respectively.
Attention mechanisms, particularly softmax attention, have been instrumental in the success of transformer-based models such as GPT. However, the quadratic memory complexity of softmax attention with respect to sequence length poses significant challenges for processing longer sequences. We introduce Cottention, a novel attention mechanism that replaces the softmax operation with cosine similarity. By leveraging the properties of cosine similarity and rearranging the attention equation, Cottention achieves native linear memory complexity with respect to sequence length, making it inherently more memory-efficient than softmax attention. We demonstrate that Cottention can be reformulated as a recurrent neural network (RNN) with a finite hidden state, allowing for constant memory usage during inference. We evaluate Cottention on both the bidirectional BERT and causal GPT tasks, demonstrating comparable performance to softmax attention while significantly reducing memory requirements. To ensure efficient computation, we develop a custom CUDA kernel for Cottention. Our results show that Cottention is a promising alternative to softmax attention, enabling the processing of longer sequences without sacrificing performance, due to its native linear memory complexity and ability to maintain a constant memory footprint during inference.
13
28 Dec 2024
In this paper, a multi-scale Fourier neural operator (MscaleFNO) is proposed to reduce the spectral bias of the FNO in learning the mapping between highly oscillatory functions, with application to the nonlinear mapping between the coefficient of the Helmholtz equation and its solution. The MscaleFNO consists of a series of parallel normal FNOs with scaled input of the function and the spatial variable, and their outputs are shown to be able to capture various high-frequency components of the mapping's image. Numerical methods demonstrate the substantial improvement of the MscaleFNO for the problem of wave scattering in the high-frequency regime over the normal FNO with a similar number of network parameters.
The most precise determination of the sum of neutrino masses from cosmological data, derived from analysis of the cosmic microwave background (CMB) and baryon acoustic acoustic oscillations (BAO) from the Dark Energy Spectroscopic Instrument (DESI), favors a value below the minimum inferred from neutrino flavor oscillation experiments. We explore which data is most responsible of this puzzling aspect of the current constraints on neutrino mass and whether it is related to other anomalies in cosmology. We demonstrate conclusively that the preference for negative neutrino masses is a consequence of larger than expected lensing of the CMB in both the two- and four-point lensing statistics. Furthermore, we show that this preference is robust to changes in likelihoods of the BAO and CMB optical depth analyses given the available data. We then show that this excess clustering is not easily explained by changes to the expansion history and is likely distinct from the preference for for dynamical dark energy in DESI BAO data. Finally, we discuss how future data may impact these results, including an analysis of Planck CMB with mock DESI 5-year data. We conclude that the negative neutrino mass preference is likely to persist even as more cosmological data is collected in the near future.
This research introduces TGN-TRec, a Temporal Graph Neural Network-Transformer-based model for scientific paper recommendation that accounts for the dynamic nature of citation networks. The system learns and updates paper embeddings over time, predicting future citation probabilities more accurately than static graph neural network methods, particularly when initialized with semantic information from SciBERT.
There are no more papers matching your filters at the moment.