alphaXiv

History

Papers Benchmarks

ISTI-CNR

397

16 Sep 2025

computer-science artificial-intelligence computation-and-language

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

University of Pisa University of Modena and Reggio Emilia ISTI-CNR

Talk2DINO, developed by researchers at UNIMORE and ISTI-CNR, combines the fine-grained visual features of DINOv2 with the rich semantic understanding of CLIP for open-vocabulary segmentation. The approach uses a lightweight, learned projection to bridge their embedding spaces, reaching an average mIoU of 46.3 on eight unsupervised OVS benchmarks with significantly fewer trained parameters than existing methods.

134

05 Apr 2024

computer-science artificial-intelligence computer-vision-and-pattern-recognition

The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding

University of Pisa ISTI-CNR

Researchers at CNR-ISTI developed Fine-Grained Open-Vocabulary Object Detection (FG-OVD), a new benchmark and evaluation protocol, to assess how well open-vocabulary object detectors understand fine-grained object attributes. They found that state-of-the-art models exhibit a drastic performance drop when distinguishing subtle attribute differences, particularly for properties like transparency and pattern, revealing a critical gap in current OVD capabilities.

561

02 Apr 2025

ai-for-health computer-science computer-vision-and-pattern-recognition

Semi-Supervised Biomedical Image Segmentation via Diffusion Models and Teacher-Student Co-Training

ISTI-CNR

Gabriele Lagani

ISTI-CNR researchers develop a semi-supervised biomedical image segmentation framework that combines diffusion models with teacher-student co-training, achieving performance comparable to fully supervised approaches while using only 20% of labeled data across multiple medical imaging datasets including GlaS, PH2, HMEPS, and LA.

18 Mar 2021

computer-science computer-vision-and-pattern-recognition multimedia

The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval

ISTI-CNR

In this paper, we describe in details VISIONE, a video search system that allows users to search for videos using textual keywords, occurrence of objects and their spatial relationships, occurrence of colors and their spatial relationships, and image similarity. These modalities can be combined together to express complex queries and satisfy user needs. The peculiarity of our approach is that we encode all the information extracted from the keyframes, such as visual deep features, tags, color and object locations, using a convenient textual encoding indexed in a single text retrieval engine. This offers great flexibility when results corresponding to various parts of the query (visual, text and locations) have to be merged. In addition, we report an extensive analysis of the system retrieval performance, using the query logs generated during the Video Browser Showdown (VBS) 2019 competition. This allowed us to fine-tune the system by choosing the optimal parameters and strategies among the ones that we tested.

21 Jun 2018

computer-science artificial-intelligence computers-and-society

A Survey Of Methods For Explaining Black Box Models

University of Pisa ISTI-CNR

A survey by researchers at the University of Pisa and ISTI-CNR systematically reviews and classifies existing methods for explaining black box AI models, providing a taxonomy of problem types, explanation forms, and model characteristics. This work offers a structured understanding of the Explainable AI field and identifies key open research questions.

02 Oct 2025

computer-science computers-and-society physics

A computational framework for quantifying route diversification in road networks

UC Berkeley University of Pisa

Lawrence Berkeley National Laboratory ISTI-CNR Scuola Normale Superiore of Pisa

The structure of road networks impacts various urban dynamics, from traffic congestion to environmental sustainability and access to essential services. Recent studies reveal that most roads are underutilized, faster alternative routes are often overlooked, and traffic is typically concentrated on a few corridors. In this article, we examine how road network structure, and in particular the presence of mobility attractors (e.g., highways and ring roads), shapes the counterpart to traffic concentration: route diversification. To this end, we introduce DiverCity, a measure that quantifies the extent to which traffic can potentially be distributed across multiple, loosely overlapping near-shortest routes. Analyzing 56 diverse global cities, we find that DiverCity is influenced by network characteristics and is associated with traffic efficiency. Within cities, DiverCity increases with distance from the city center before stabilizing in the periphery, but declines in the proximity of mobility attractors. We demonstrate that strategic speed limit adjustments on mobility attractors can increase DiverCity while preserving travel efficiency. We isolate the complex interplay between mobility attractors and DiverCity through simulations in a controlled setting, confirming the patterns observed in real-world cities. DiverCity provides a practical tool for urban planners and policymakers to optimize road network design and balance route diversification, efficiency, and sustainability. We provide an interactive platform (this https URL) to visualize the spatial distribution of DiverCity across all considered cities.

24 May 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

Semantic Aware Diffusion Inverse Tone Mapping

University of Warwick ISTI-CNR

The range of real-world scene luminance is larger than the capture capability of many digital camera sensors which leads to details being lost in captured images, most typically in bright regions. Inverse tone mapping attempts to boost these captured Standard Dynamic Range (SDR) images back to High Dynamic Range (HDR) by creating a mapping that linearizes the well exposed values from the SDR image, and provides a luminance boost to the clipped content. However, in most cases, the details in the clipped regions cannot be recovered or estimated. In this paper, we present a novel inverse tone mapping approach for mapping SDR images to HDR that generates lost details in clipped regions through a semantic-aware diffusion based inpainting approach. Our method proposes two major contributions - first, we propose to use a semantic graph to guide SDR diffusion based inpainting in masked regions in a saturated image. Second, drawing inspiration from traditional HDR imaging and bracketing methods, we propose a principled formulation to lift the SDR inpainted regions to HDR that is compatible with generative inpainting methods. Results show that our method demonstrates superior performance across different datasets on objective metrics, and subjective experiments show that the proposed method matches (and in most cases outperforms) state-of-art inverse tone mapping operators in terms of objective metrics and outperforms them for visual fidelity.

29 Jul 2022

computer-science artificial-intelligence computation-and-language

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

ISTI-CNR Univ. of Modena and Reggio Emilia

ALADIN introduces a two-stage architecture that distills fine-grained alignment scores into an efficient common embedding space, enabling high-performance image-text matching and retrieval. The model achieves competitive recall while demonstrating up to a 90-fold increase in inference speed compared to entangled Vision-Language Transformers.

21 Sep 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

AIM 2025 challenge on Inverse Tone Mapping Report: Methods and Results

Peng Cheng Laboratory University of Pisa University of Trento University of Würzburg ISTI-CNR MPI-Info

This paper presents a comprehensive review of the AIM 2025 Challenge on Inverse Tone Mapping (ITM). The challenge aimed to push forward the development of effective ITM algorithms for HDR image reconstruction from single LDR inputs, focusing on perceptual fidelity and numerical consistency. A total of \textbf{67} participants submitted \textbf{319} valid results, from which the best five teams were selected for detailed analysis. This report consolidates their methodologies and performance, with the lowest PU21-PSNR among the top entries reaching 29.22 dB. The analysis highlights innovative strategies for enhancing HDR reconstruction quality and establishes strong benchmarks to guide future research in inverse tone mapping.

02 Dec 2024

computer-science artificial-intelligence cryptography-and-security

Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

Sapienza University of Rome ISTI-CNR

Edoardo Gabrielli

FLANDERS (Federated Learning meets ANomaly DEtection for a Robust and Secure) introduces a pre-aggregation filter for Federated Learning that leverages multidimensional time series anomaly detection to combat extreme model poisoning attacks. The system achieves near-perfect detection accuracy and significantly boosts global model performance, even when 80% of clients are malicious, without requiring prior knowledge of attacker proportions.

6,200

25 Jan 2019

computer-science artificial-intelligence applications

PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach

University of Pisa ISTI-CNR Wyscout

The problem of evaluating the performance of soccer players is attracting the interest of many companies and the scientific community, thanks to the availability of massive data capturing all the events generated during a match (e.g., tackles, passes, shots, etc.). Unfortunately, there is no consolidated and widely accepted metric for measuring performance quality in all of its facets. In this paper, we design and implement PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We build our framework by deploying a massive dataset of soccer-logs and consisting of millions of match events pertaining to four seasons of 18 prominent soccer competitions. By comparing PlayeRank to known algorithms for performance evaluation in soccer, and by exploiting a dataset of players' evaluations made by professional soccer scouts, we show that PlayeRank significantly outperforms the competitors. We also explore the ratings produced by {\sf PlayeRank} and discover interesting patterns about the nature of excellent performances and what distinguishes the top players from the others. At the end, we explore some applications of PlayeRank -- i.e. searching players and player versatility --- showing its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.

29 Sep 2025

computer-science data-structures-and-algorithms information-retrieval

Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Northeastern University University of Pisa ISTI-CNR

Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.

1,099

29 Apr 2024

computer-science information-retrieval

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

University of Pisa ISTI-CNR Pinecone

The research introduces Seismic, a new inverted index structure designed for efficient approximate retrieval over learned sparse representations (LSRs). It achieves sub-millisecond query latencies and high recall, outperforming state-of-the-art inverted index and graph-based ANN methods by significant margins for sparse vector search.

156

20 Jan 2022

attention-mechanisms computer-science computer-vision-security

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

University of Pisa ISTI-CNR

Deepfakes are the result of digital manipulation to forge realistic yet fake imagery. With the astonishing advances in deep generative models, fake images or videos are nowadays obtained using variational autoencoders (VAEs) or Generative Adversarial Networks (GANs). These technologies are becoming more accessible and accurate, resulting in fake videos that are very difficult to be detected. Traditionally, Convolutional Neural Networks (CNNs) have been used to perform video deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we focus on video deep fake detection on faces, given that most methods are becoming extremely accurate in the generation of realistic human faces. Specifically, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. Furthermore, we present a straightforward inference procedure based on a simple voting scheme for handling multiple faces in the same video shot. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC).

242

10 Mar 2025

computer-science computer-vision-security computer-vision-and-pattern-recognition

HDRT: A Large-Scale Dataset for Infrared-Guided HDR Imaging

East China University of Science and Technology

University of Warwick ISTI-CNR

Capturing images with enough details to solve imaging tasks is a long-standing challenge in imaging, particularly due to the limitations of standard dynamic range (SDR) images which often lose details in underexposed or overexposed regions. Traditional high dynamic range (HDR) methods, like multi-exposure fusion or inverse tone mapping, struggle with ghosting and incomplete data reconstruction. Infrared (IR) imaging offers a unique advantage by being less affected by lighting conditions, providing consistent detail capture regardless of visible light intensity. In this paper, we introduce the HDRT dataset, the first comprehensive dataset that consists of HDR and thermal IR images. The HDRT dataset comprises 50,000 images captured across three seasons over six months in eight cities, providing a diverse range of lighting conditions and environmental contexts. Leveraging this dataset, we propose HDRTNet, a novel deep neural method that fuses IR and SDR content to generate HDR images. Extensive experiments validate HDRTNet against the state-of-the-art, showing substantial quantitative and qualitative quality improvements. The HDRT dataset not only advances IR-guided HDR imaging but also offers significant potential for broader research in HDR imaging, multi-modal fusion, domain transfer, and beyond. The dataset is available at this https URL

30 Apr 2025

computer-science human-computer-interaction information-retrieval

Efficient Conversational Search via Topical Locality in Dense Retrieval

University of Pisa ISTI-CNR

An efficient method named TopLoc is introduced to reduce retrieval latency in dense retrieval conversational search systems by leveraging topical locality within multi-turn conversations. The approach integrates with standard ANN algorithms like IVF and HNSW, achieving up to 10.4x speedup and reducing average query response time to as low as 0.7 msec, while largely preserving retrieval effectiveness.

21 Oct 2025

agents computer-science continual-learning

A Compositional Paradigm for Foundation Models: Towards Smarter Robotic Agents

University of Pisa ISTI-CNR Luiss University

The birth of Foundation Models brought unprecedented results in a wide range of tasks, from language to vision, to robotic control. These models are able to process huge quantities of data, and can extract and develop rich representations, which can be employed across different domains and modalities. However, they still have issues in adapting to dynamic, real-world scenarios without retraining the entire model from scratch. In this work, we propose the application of Continual Learning and Compositionality principles to foster the development of more flexible, efficient and smart AI solutions.

01 Oct 2025

computer-science artificial-intelligence computer-vision-and-pattern-recognition

Batch-CAM: Introduction to better reasoning in convolutional deep learning models

University of Pisa ISTI-CNR

Understanding the inner workings of deep learning models is crucial for advancing artificial intelligence, particularly in high-stakes fields such as healthcare, where accurate explanations are as vital as precision. This paper introduces Batch-CAM, a novel training paradigm that fuses a batch implementation of the Grad-CAM algorithm with a prototypical reconstruction loss. This combination guides the model to focus on salient image features, thereby enhancing its performance across classification tasks. Our results demonstrate that Batch-CAM achieves a simultaneous improvement in accuracy and image reconstruction quality while reducing training and inference times. By ensuring models learn from evidence-relevant information,this approach makes a relevant contribution to building more transparent, explainable, and trustworthy AI systems.

18 Jul 2025

computer-science information-retrieval

PARK: Personalized academic retrieval with knowledge-graphs

ISTI-CNR University of Milan-Bicocca

Academic Search is a search task aimed to manage and retrieve scientific documents like journal articles and conference papers. Personalization in this context meets individual researchers' needs by leveraging, through user profiles, the user related information (e.g. documents authored by a researcher), to improve search effectiveness and to reduce the information overload. While citation graphs are a valuable means to support the outcome of recommender systems, their use in personalized academic search (with, e.g. nodes as papers and edges as citations) is still under-explored. Existing personalized models for academic search often struggle to fully capture users' academic interests. To address this, we propose a two-step approach: first, training a neural language model for retrieval, then converting the academic graph into a knowledge graph and embedding it into a shared semantic space with the language model using translational embedding techniques. This allows user models to capture both explicit relationships and hidden structures in citation graphs and paper content. We evaluate our approach in four academic search domains, outperforming traditional graph-based and personalized models in three out of four, with up to a 10\% improvement in MAP@100 over the second-best model. This highlights the potential of knowledge graph-based user models to enhance retrieval effectiveness.

23 Apr 2025

computer-science continual-learning computation-and-language

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Sapienza University of Rome ISTI-CNR Babelscape ILC-CNR

The number of pretrained Large Language Models (LLMs) is increasing steadily, though the majority are designed predominantly for the English language. While state-of-the-art LLMs can handle other languages, due to language contamination or some degree of multilingual pretraining data, they are not optimized for non-English languages, leading to inefficient encoding (high token "fertility") and slower inference speed. In this work, we thoroughly compare a variety of vocabulary adaptation techniques for optimizing English LLMs for the Italian language, and put forward Semantic Alignment Vocabulary Adaptation (SAVA), a novel method that leverages neural mapping for vocabulary substitution. SAVA achieves competitive performance across multiple downstream tasks, enhancing grounded alignment strategies. We adapt two LLMs: Mistral-7b-v0.1, reducing token fertility by 25\%, and Llama-3.1-8B, optimizing the vocabulary and reducing the number of parameters by 1 billion. We show that, following the adaptation of the vocabulary, these models can recover their performance with a relatively limited stage of continual training on the target language. Finally, we test the capabilities of the adapted models on various multi-choice and generative tasks.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Talking to DINO: Bridging Self-Supervised Vision Backbones with Language for Open-Vocabulary Segmentation

The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding

Semi-Supervised Biomedical Image Segmentation via Diffusion Models and Teacher-Student Co-Training

The VISIONE Video Search System: Exploiting Off-the-Shelf Text Search Engines for Large-Scale Video Retrieval

A Survey Of Methods For Explaining Black Box Models

A computational framework for quantifying route diversification in road networks

Semantic Aware Diffusion Inverse Tone Mapping

ALADIN: Distilling Fine-grained Alignment Scores for Efficient Image-Text Matching and Retrieval

AIM 2025 challenge on Inverse Tone Mapping Report: Methods and Results

Protecting Federated Learning from Extreme Model Poisoning Attacks via Multidimensional Time Series Anomaly Detection

PlayeRank: data-driven performance evaluation and player ranking in soccer via a machine learning approach

Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

Combining EfficientNet and Vision Transformers for Video Deepfake Detection

HDRT: A Large-Scale Dataset for Infrared-Guided HDR Imaging

Efficient Conversational Search via Topical Locality in Dense Retrieval

A Compositional Paradigm for Foundation Models: Towards Smarter Robotic Agents

Batch-CAM: Introduction to better reasoning in convolutional deep learning models

PARK: Personalized academic retrieval with knowledge-graphs

Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation

Events

AI for Law

Personalize Your Feed