Technical University of Iasi
CLIP-DPO is a preference optimization method that leverages a pre-trained Vision-Language (VL) embedding model, CLIP, to generate preference data for Direct Preference Optimization (DPO), effectively reducing hallucinations in Large Vision Language Models (LVLMs). This approach eliminates the need for expensive external APIs or additional datasets, achieving significant improvements in hallucination reduction and object grounding on models like LLaVA-1.5 and MobileVLM-v2.
VladVA, developed by Samsung AI Cambridge, is a discriminative fine-tuning framework that transforms generative Large Vision-Language Models into highly capable discriminative models. This method enables LVLMs to achieve state-of-the-art performance on image-text retrieval benchmarks such as Flickr30k and MS-COCO, and demonstrate advanced compositional understanding on SugarCrepe and Winoground, often outperforming larger, contrastively-trained models.
This paper presents a novel path-planning and task assignment algorithm for multi-robot systems that should fulfill a global Boolean specification. The proposed method is based on Integer Linear Programming (ILP) formulations, which are combined with structural insights from Petri nets to improve scalability and computational efficiency. By proving that the \emph{constraint matrix} is totally unimodular (TU) for certain classes of problems, the ILP formulation can be relaxed into a Linear Programming (LP) problem without losing the integrality of the solution. This relaxation eliminates complex combinatorial techniques, significantly reducing computational overhead and thus ensuring scalability for large-scale systems. Using the approach proposed in this paper, we can solve path-planning problems for teams made up to 500 robots. The method guarantees computational tractability, handles collision avoidance and reduces computational demands through iterative LP optimization techniques. Case studies demonstrate the efficiency of the algorithm in generating scalable, collision-free paths for large robot teams navigating in complex environments. While the conservative nature of collision avoidance introduces additional constraints, and thus, computational requirements, the solution remains practical and impactful for diverse applications. The algorithm is particularly applicable to real-world scenarios, including warehouse logistics where autonomous robots must efficiently coordinate tasks or search-and-rescue operations in various environments. This work contributes both theoretically and practically to scalable multi-robot path planning and task allocation, offering an efficient framework for coordinating autonomous agents in shared environments.
Despite noise and caption quality having been acknowledged as important factors impacting vision-language contrastive pre-training, in this paper, we show that the full potential of improving the training process by addressing such issues is yet to be realized. Specifically, we firstly study and analyze two issues affecting training: incorrect assignment of negative pairs, and low caption quality and diversity. Then, we devise effective solutions for addressing both problems, which essentially require training with multiple true positive pairs. Finally, we propose training with sigmoid loss to address such a requirement. We show very large gains over the current state-of-the-art for both image recognition (+6%\sim +6\% on average over 11 datasets) and image retrieval (+19%\sim +19\% on Flickr30k and +15%\sim +15\% on MSCOCO).
We propose that wave propagation through a class of mechanical metamaterials opens unprecedented avenues in seismic wave protection based on spectral properties of auxetic-like metamaterials. The elastic parameters of these metamaterials like the bulk and shear moduli, the mass density, and even the Poisson ratio, can exhibit negative values in elastic stop bands. We show here that the propagation of seismic waves with frequencies ranging from 1Hz to 40Hz can be influenced by a decameter scale version of auxetic-like metamaterials buried in the soil, with the combined effects of impedance mismatch, local resonances and Bragg stop bands. More precisely, we numerically examine and illustrate the markedly different behaviors between the propagation of seismic waves through a homogeneous isotropic elastic medium (concrete) and an auxetic-like metamaterial plate consisting of 64 cells (40mx40mx40m), utilized here as a foundation of a building one would like to protect from seismic site effects. This novel class of seismic metamaterials opens band gaps at frequencies compatible with seismic waves when they are designed appropriately, what makes them interesting candidates for seismic isolation structures.
We study in this paper the improvement of one-class support vector machines (OC-SVM) through sparse representation techniques for unsupervised anomaly detection. As Dictionary Learning (DL) became recently a common analysis technique that reveals hidden sparse patterns of data, our approach uses this insight to endow unsupervised detection with more control on pattern finding and dimensions. We introduce a new anomaly detection model that unifies the OC-SVM and DL residual functions into a single composite objective, subsequently solved through K-SVD-type iterative algorithms. A closed-form of the alternating K-SVD iteration is explicitly derived for the new composite model and practical implementable schemes are discussed. The standard DL model is adapted for the Dictionary Pair Learning (DPL) context, where the usual sparsity constraints are naturally eliminated. Finally, we extend both objectives to the more general setting that allows the use of kernel functions. The empirical convergence properties of the resulting algorithms are provided and an in-depth analysis of their parametrization is performed while also demonstrating their numerical performance in comparison with existing methods.
It turns out that some empirical facts in Big Data are the effects of properties of large numbers. Zipf's law 'noise' is an example of such an artefact. We expose several properties of the power law distributions and of similar distribution that occur when the population is finite and the rank and counts of elements in the population are natural numbers. We are particularly concerned with the low-rank end of the graph of the law, the potential of noise in the law, and with the approximation of the number of types of objects at various ranks. Approximations instead of exact solutions are the center of attention. Consequences in the interpretation of Zipf's law are discussed.
There are no more papers matching your filters at the moment.