Drake University
Large multimodal models (LMMs) are increasingly integrated into autonomous driving systems for user interaction. However, their limitations in fine-grained spatial reasoning pose challenges for system interpretability and user trust. We introduce Logic-RAG, a novel Retrieval-Augmented Generation (RAG) framework that improves LMMs' spatial understanding in driving scenarios. Logic-RAG constructs a dynamic knowledge base (KB) about object-object relationships in first-order logic (FOL) using a perception module, a query-to-logic embedder, and a logical inference engine. We evaluated Logic-RAG on visual-spatial queries using both synthetic and real-world driving videos. When using popular LMMs (GPT-4V, Claude 3.5) as proxies for an autonomous driving system, these models achieved only 55% accuracy on synthetic driving scenes and under 75% on real-world driving scenes. Augmenting them with Logic-RAG increased their accuracies to over 80% and 90%, respectively. An ablation study showed that even without logical inference, the fact-based context constructed by Logic-RAG alone improved accuracy by 15%. Logic-RAG is extensible: it allows seamless replacement of individual components with improved versions and enables domain experts to compose new knowledge in both FOL and natural language. In sum, Logic-RAG addresses critical spatial reasoning deficiencies in LMMs for autonomous driving applications. Code and data are available at this https URL
This paper introduces a dataset for improving real-time object recognition systems to aid blind and low-vision (BLV) individuals in navigation tasks. The dataset comprises 21 videos of BLV individuals navigating outdoor spaces, and a taxonomy of 90 objects crucial for BLV navigation, refined through a focus group study. We also provide object labeling for the 90 objects across 31 video segments created from the 21 videos. A deeper analysis reveals that most contemporary datasets used in training computer vision models contain only a small subset of the taxonomy in our dataset. Preliminary evaluation of state-of-the-art computer vision models on our dataset highlights shortcomings in accurately detecting key objects relevant to BLV navigation, emphasizing the need for specialized datasets. We make our dataset publicly available, offering valuable resources for developing more inclusive navigation systems for BLV individuals.
This paper presents a curated list of 90 objects essential for the navigation of blind and low-vision (BLV) individuals, encompassing road, sidewalk, and indoor environments. We develop the initial list by analyzing 21 publicly available videos featuring BLV individuals navigating various settings. Then, we refine the list through feedback from a focus group study involving blind, low-vision, and sighted companions of BLV individuals. A subsequent analysis reveals that most contemporary datasets used to train recent computer vision models contain only a small subset of the objects in our proposed list. Furthermore, we provide detailed object labeling for these 90 objects across 31 video segments derived from the original 21 videos. Finally, we make the object list, the 21 videos, and object labeling in the 31 video segments publicly available. This paper aims to fill the existing gap and foster the development of more inclusive and effective navigation aids for the BLV community.
Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at different scales during learning (see Fig. 1). We find that adding this geometric consistency loss significantly accelerates learning by explicitly penalizing geometrically inconsistent pixels, reducing the training iteration requirements to nearly half that of other MVS methods. Our extensive experiments show that our approach achieves a new state-of-the-art on the DTU and BlendedMVS datasets, and competitive results on the Tanks and Temples benchmark. To the best of our knowledge, GC-MVSNet is the first attempt to enforce multi-view, multi-scale geometric consistency during learning.
39
The standard vector autoregressive (VAR) models suffer from overparameterization which is a serious issue for high-dimensional time series data as it restricts the number of variables and lags that can be incorporated into the model. Several statistical methods, such as the reduced-rank model for multivariate (multiple) time series (Velu et al., 1986; Reinsel and Velu, 1998; Reinsel et al., 2022) and the Envelope VAR model (Wang and Ding, 2018), provide solutions for achieving dimension reduction of the parameter space of the VAR model. However, these methods can be inefficient in extracting relevant information from complex data, as they fail to distinguish between relevant and irrelevant information, or they are inefficient in addressing the rank deficiency problem. We put together the idea of envelope models into the reduced-rank VAR model to simultaneously tackle these challenges, and propose a new parsimonious version of the classical VAR model called the reduced-rank envelope VAR (REVAR) model. Our proposed REVAR model incorporates the strengths of both reduced-rank VAR and envelope VAR models and leads to significant gains in efficiency and accuracy. The asymptotic properties of the proposed estimators are established under different error assumptions. Simulation studies and real data analysis are conducted to evaluate and illustrate the proposed method.
We report a joint experimental and theoretical study of a three-sideband (3-SB) modification of the "reconstruction of attosecond beating by interference of two-photon transitions" (RABBIT) setup. The 3-SB RABBIT scheme makes it possible to investigate phases resulting from interference between transitions of different orders in the continuum. Furthermore, the strength of this method is its ability to focus on the atomic phases only, independent of a chirp in the harmonics, by comparing the RABBIT phases extracted from specific SB groups formed by two adjacent harmonics. We verify earlier predictions that the phases and the corresponding time delays in the three SBs extracted from angle-integrated measurements become similar with increasing photon electron energy. A variation in the angle dependence of the RABBIT phases in the three SBs results from the distinct Wigner and continuum-continuum coupling phases associated with the individual angular momentum channels. A qualitative explanation of this dependence is attempted by invoking a propensity rule. Comparison between the experimental data and predictions from an R-matrix (close-coupling) with time dependence calculation shows qualitative agreement in the observed trends.
Recent research into analog computing has introduced new notions of computing real numbers. Huang, Klinge, Lathrop, Li, and Lutz defined a notion of computing real numbers in real-time with chemical reaction networks (CRNs), introducing the classes RLCRN\mathbb{R}_\text{LCRN} (the class of all Lyapunov CRN-computable real numbers) and RRTCRN\mathbb{R}_\text{RTCRN} (the class of all real-time CRN-computable numbers). In their paper, they show the inclusion of the real algebraic numbers $ALG \subseteq \mathbb{R}_\text{LCRN} \subseteq \mathbb{R}_\text{RTCRN}andthat and that ALG \subsetneqq \mathbb{R}_\text{RTCRN}$ but leave open where the inclusion is proper. In this paper, we resolve this open problem and show $ALG= \mathbb{R}_\text{LCRN} \subsetneqq \mathbb{R}_\text{RTCRN}$. However, their definition of real-time computation is fragile in the sense that it is sensitive to perturbations in initial conditions. To resolve this flaw, we further require a CRN to withstand these perturbations. In doing so, we arrive at a discrete model of memory. This approach has several benefits. First, a bounded CRN may compute values approximately in finite time. Second, a CRN can tolerate small perturbations of its species' concentrations. Third, taking a measurement of a CRN's state only requires precision proportional to the exactness of these approximations. Lastly, if a CRN requires only finite memory, this model and Turing machines are equivalent under real-time simulations.
Traditional multi-view stereo (MVS) methods primarily depend on photometric and geometric consistency constraints. In contrast, modern learning-based algorithms often rely on the plane sweep algorithm to infer 3D geometry, applying explicit geometric consistency (GC) checks only as a post-processing step, with no impact on the learning process itself. In this work, we introduce GC MVSNet plus plus, a novel approach that actively enforces geometric consistency of reference view depth maps across multiple source views (multi view) and at various scales (multi scale) during the learning phase (see Fig. 1). This integrated GC check significantly accelerates the learning process by directly penalizing geometrically inconsistent pixels, effectively halving the number of training iterations compared to other MVS methods. Furthermore, we introduce a densely connected cost regularization network with two distinct block designs simple and feature dense optimized to harness dense feature connections for enhanced regularization. Extensive experiments demonstrate that our approach achieves a new state of the art on the DTU and BlendedMVS datasets and secures second place on the Tanks and Temples benchmark. To our knowledge, GC MVSNet plus plus is the first method to enforce multi-view, multi-scale supervised geometric consistency during learning. Our code is available.
In the context of a vector autoregression (VAR) model, or any multivariate regression model, the number of relevant predictors may be small relative to the information set available from which to build a prediction equation. It is well known that forecasts based off of (un-penalized) least squares estimates can overfit the data and lead to poor predictions. Since the Minnesota prior was proposed (Doan et al. (1984)), there have been many methods developed aiming at improving prediction performance. In this paper we propose the horseshoe prior (Carvalho et al. (2010), Carvalho et al. (2009)) in the context of a Bayesian VAR. The horseshoe prior is a unique shrinkage prior scheme in that it shrinks irrelevant signals rigorously to 0 while allowing large signals to remain large and practically unshrunk. In an empirical study, we show that the horseshoe prior competes favorably with shrinkage schemes commonly used in Bayesian VAR models as well as with a prior that imposes true sparsity in the coefficient vector. Additionally, we propose the use of particle Gibbs with backwards simulation (Lindsten et al. (2012), Andrieu et al. (2010)) for the estimation of the time-varying volatility parameters. We provide a detailed description of all MCMC methods used in the supplementary material that is available online.
We consider a susceptible, infected, and recovered infectious disease model which incorporates a vaccination rate. In particular, we study the problem of choosing the vaccination rate in order to reduce the number of infected individuals to a given threshold as quickly as possible. This is naturally a problem of time-optimal control. We interpret the optimal time as a solution of two dynamic programming equations and give necessary and sufficient conditions for a vaccination rate to be optimal.
Let GG be a graph, and ZZ a subset of its vertices, which we color black, while the remaining are colored white. We define the skew color change rule as follows: if uu is a vertex of GG, and exactly one of its neighbors vv, is white, then change the color of vv to black. A set ZZ is a skew zero forcing set for GG if the application of the skew color change rule (as many times as necessary) will result in all the vertices in GG colored black. A set ZZ is a minimum skew zero forcing set for GG if it is a skew zero forcing set for GG of least cardinality. The skew zero forcing number \sZ(G)\sZ (G) is the minimum of Z|Z| over all skew zero forcing sets ZZ for GG. In this paper we discuss graphs that have extreme skew zero forcing number. We characterize complete multipartite graphs in terms of \sZ(G)\sZ (G). We note relations between minimum skew zero forcing sets and matchings in some bipartite graphs, and in unicyclic graphs. We establish that the elements in the set of minimum skew zero forcing sets in certain bipartite graphs are the bases of a matroid.
High precision variational calculations in Hylleraas coordinates are presented for all singlet and triplet PP-states of helium up to principal quantum number n=35n = 35 with a uniform accuracy of 1 part in 102210^{22} for the nonrelativistic energy. Mass polarization, relativistic and quantum electrodynamic effects are included to achieve a final accuracy of ±\pm1 kHz or better for the ionization energy of the Rydberg states of 4^4He in the range 24n3524\le n \le 35. The results are combined with 11 transition frequency measurements of Clausen et al.\ Phys.\ Rev. A {\bf 111}, 012817 (2025) to obtain complementary measurements of the ionization energy of the 1s2s  3S11s2s\;^3S_1 state that do not depend on quantum defect extrapolations to the series limit. The result from the triplet spectrum yields an ionization energy of 1152\,842\,742.728(6) MHz, which agrees with but is larger than the experimental value by 14 ±\pm17 kHz. However, it confirms a much larger 9σ\sigma discrepancy of 0.468±0.0550.468\pm0.055 MHz with the theoretical ionization energy of Patkóš et al.\ Phys.\ Rev.\ A {\bf 103}, 042809 (2021). The results provide a test of the quantum defect extrapolation method at the level of ±\pm17 kHz. (11 pages, 1 figure).
We revisit the time-resolved photoemission in neon atoms as probed by attosecond streaking. We calculate streaking time shifts for the emission of 2p and 2s electrons and compare the relative delay as measured in a recent experiment by Schultze et al. [Science 328, 1658 (2010)]. The B-spline R-matrix method is employed to calculate accurate Eisenbud-Wigner-Smith time delays from multi- electron dipole transition matrix elements for photoionization. The additional laser field-induced time shifts in the exit channel are obtained from separate, time-dependent simulations of a full streaking process by solving the time-dependent Schr\"odinger equation on the single-active-electron level. The resulting accurate total relative streaking time shifts between 2s and 2p emission lie well below the experimental data. We identify the presence of unresolved shake-up satellites in the experiment as a potential source of error in the determination of streaking time shifts.
We report a joint experimental and theoretical study using a combination of polarization-controlled free-electron-laser (FEL) and near-infra\-red (NIR) pulses in a synchronized two-color photo\-ionization scheme. Excited He+^+ ions, created by extreme ultraviolet (XUV) circularly polarized radiation from the XUV-FEL FERMI in the oriented 3p(m ⁣= ⁣+1)3p\, (m\!=\!+1) state, are exposed to circularly polarized 784-nm NIR radiation with peak intensities from 1012W/cm210^{12}\,\rm W/cm^2 to 1013W/cm2\rm 10^{13}\,W/cm^2. The angular distribution of the ejected electrons exhibit a strong dichroism depending on the NIR intensity. While the co-rotating case is defined by a single path, for the counter-rotating case, there are two dominant pathways whose relative strength and phase difference are determined.
Attosecond photoelectron interferometry based on the combination of an attosecond pulse train and a synchronized infrared field is a fundamental technique for the temporal characterization of attosecond waveforms and for the investigation of electron dynamics in the photoionization process. In this approach, the comb of extreme ultraviolet harmonics typically lies above the ionization threshold of the target under investigation, thus releasing a photoelectron by single-photon absorption. The interaction of the outgoing photoelectron with the infrared pulse results in the absorption or emission of infrared photons, thereby creating additional peaks in the photoelectron spectrum, referred to as sidebands. While, in the absence of resonances in the first ionization step, the phases imparted on the photoionization process evolve smoothly with the photon energy, the presence of intermediate resonances imprints a large additional phase on the outgoing photoelectron wave packet. In this work, using a comb of harmonics below and above the ionization threshold of neon, we investigate the effect of intermediate bound excited states on attosecond photoelectron interferometry. We show that the phase of the oscillations of the sidebands and their angular distributions are strongly affected by such resonances. By slightly tuning the photon energies of the extreme ultraviolet harmonics, we show how the contributions of selected resonances can be enhanced or suppressed.
In this paper we investigate the problem of localizing a mobile device based on readings from its embedded sensors utilizing machine learning methodologies. We consider a real-world environment, collect a large dataset of 3110 datapoints, and examine the performance of a substantial number of machine learning algorithms in localizing a mobile device. We have found algorithms that give a mean error as accurate as 0.76 meters, outperforming other indoor localization systems reported in the literature. We also propose a hybrid instance-based approach that results in a speed increase by a factor of ten with no loss of accuracy in a live deployment over standard instance-based methods, allowing for fast and accurate localization. Further, we determine how smaller datasets collected with less density affect accuracy of localization, important for use in real-world environments. Finally, we demonstrate that these approaches are appropriate for real-world deployment by evaluating their performance in an online, in-motion experiment.
We investigate the coherent control of the photo\-electron angular distribution in bichromatic atomic ionization. Neon is selected as target since it is one of the most popular systems in current gas-phase experiments with free-electron lasers (FELSs). In particular, we tackle practical questions, such as the role of the fine-structure splitting, the pulse length, and the intensity. Time-dependent and stationary perturbation theory are employed, and we also solve the time-dependent Schr\"odinger equation in a single-active electron model. We consider neon ionized by a FEL pulse whose fundamental frequency is in resonance with either 2p3s2p-3s or 2p4s2p-4s excitation. The contribution of the non\-resonant two-photon process and its potential constructive or destructive role for quantum coherent control is investigated.
We analyze the photoelectron angular distribution in two-pathway interference between non\-resonant one-photon and resonant two-photon ionization of neon. We consider a bichromatic femtosecond XUV pulse whose fundamental frequency is tuned near the 2p53s2p^5 3s atomic states of neon. The time-dependent Schr\"odinger equation is solved and the results are employed to compute the angular distribution and the associated anisotropy parameters at the main photoelectron line. We also employ a time-dependent perturbative approach, which allows obtaining information on the process for a large range of pulse parameters, including the steady-state case of continuous radiation, i.e., an infinitely long pulse. The results from the two methods are in relatively good agreement over the domain of applicability of perturbation theory.
Free-electron lasers (FELs) are the world's most brilliant light sources with rapidly evolving technological capabilities in terms of ultrabright and ultrashort pulses over a large range of accessible photon energies. Their revolutionary and innovative developments have opened new fields of science regarding nonlinear light-matter interaction, the investigation of ultrafast processes from specific observer sites, and approaches to imaging matter with atomic resolution. A core aspect of FEL science is the study of isolated and prototypical systems in the gas phase with the possibility of addressing well-defined electronic transitions or particular atomic sites in molecules. Notably for polarization-controlled short-wavelength FELs, the gas phase offers new avenues for investigations of nonlinear and ultrafast phenomena in spin orientated systems, for decoding the function of the chiral building blocks of life as well as steering reactions and particle emission dynamics in otherwise inaccessible ways. This roadmap comprises descriptions of technological capabilities of facilities worldwide, innovative diagnostics and instrumentation, as well as recent scientific highlights, novel methodology and mathematical modeling. The experimental and theoretical landscape of using polarization controllable FELs for dichroic light-matter interaction in the gas phase will be discussed and comprehensively outlined to stimulate and strengthen global collaborative efforts of all disciplines.
We investigate the photoionization dynamics of atoms subjected to intense, ultrashort laser pulses through the use of quantum trajectories. This method provides a unique and consistent framework for examining electron dynamics within a time-dependent potential barrier. Our findings demonstrate that quantum trajectories offer additional insights into several key aspects of strong-field ionization, including the transition between ionization regimes, non-adiabatic effects under the barrier, the impact of the shape of the electronic potential, and the efficiency of over-the-barrier ionization.
There are no more papers matching your filters at the moment.