alphaXiv

LIPNUniversity Sorbonne Paris Nord

31 Mar 2025

computer-science artificial-intelligence computation-and-language

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

University Paris-Saclay University Sorbonne Paris Nord

We propose a unified framework that integrates object detection (OD) and visual grounding (VG) for remote sensing (RS) imagery. To support conventional OD and establish an intuitive prior for VG task, we fine-tune an open-set object detector using referring expression data, framing it as a partially supervised OD task. In the first stage, we construct a graph representation of each image, comprising object queries, class embeddings, and proposal locations. Then, our task-aware architecture processes this graph to perform the VG task. The model consists of: (i) a multi-branch network that integrates spatial, visual, and categorical features to generate task-aware proposals, and (ii) an object reasoning network that assigns probabilities across proposals, followed by a soft selection mechanism for final referring object localization. Our model demonstrates superior performance on the OPT-RSVG and DIOR-RSVG datasets, achieving significant improvements over state-of-the-art methods while retaining classical OD capabilities. The code will be available in our repository: \url{this https URL}.

817

15 Jan 2024

computer-science artificial-intelligence computation-and-language

An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction

LIPN FI Group Universit T Universit T e Sorbonne Paris Nord

In this paper, we propose a novel method for joint entity and relation extraction from unstructured text by framing it as a conditional sequence generation problem. In contrast to conventional generative information extraction models that are left-to-right token-level generators, our approach is \textit{span-based}. It generates a linearized graph where nodes represent text spans and edges represent relation triplets. Our method employs a transformer encoder-decoder architecture with pointing mechanism on a dynamic vocabulary of spans and relation types. Our model can capture the structural characteristics and boundaries of entities and relations through span representations while simultaneously grounding the generated output in the original text thanks to the pointing mechanism. Evaluation on benchmark datasets validates the effectiveness of our approach, demonstrating competitive results. Code is available at this https URL.

09 Apr 2025

ai-for-health attention-mechanisms computer-science

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

Harvard Medical School Massachusetts General Hospital German Cancer Research Center University of Orleans University Sorbonne Paris Nord The First Affiliated Hospital of Soochow University Geneva University Hospital Jiangsu Institute of Hematology Henan University of Chinese Medicine

Researchers at Massachusetts General Hospital and collaborating institutions develop a region-adaptive MRI super-resolution system combining mixture-of-experts with diffusion models, achieving superior image quality metrics while enabling specialized processing of distinct anatomical regions through three expert networks that dynamically adapt to tissue characteristics.

17 Jun 2024

attention-mechanisms computer-science computer-vision-security

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

University Sorbonne Paris Nord

Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Differing from object detection in natural images, object detection in remote sensing images faces challenges of scarcity of annotated data and the presence of small objects represented by only a few pixels. Multi-modal fusion has been determined to enhance the accuracy by fusing data from multiple modalities such as RGB, infrared (IR), lidar, and synthetic aperture radar (SAR). To this end, the fusion of representations at the mid or late stage, produced by parallel subnetworks, is dominant, with the disadvantages of increasing computational complexity in the order of the number of modalities and the creation of additional engineering obstacles. Using the cross-attention mechanism, we propose a novel multi-modal fusion strategy for mapping relationships between different channels at the early stage, enabling the construction of a coherent input by aligning the different modalities. By addressing fusion in the early stage, as opposed to mid or late-stage methods, our method achieves competitive and even superior performance compared to existing techniques. Additionally, we enhance the SWIN transformer by integrating convolution layers into the feed-forward of non-shifting blocks. This augmentation strengthens the model's capacity to merge separated windows through local attention, thereby improving small object detection. Extensive experiments prove the effectiveness of the proposed multimodal fusion module and the architecture, demonstrating their applicability to object detection in multimodal aerial imagery.

22 Sep 2025

computer-science robotics

M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer

Northwestern Polytechnical University University Sorbonne Paris Nord

In recent years, multimodal learning has become essential in robotic vision and information fusion, especially for understanding human behavior in complex environments. However, current methods struggle to fully leverage the textual modality, relying on supervised pretrained models, which limits semantic extraction in unsupervised robotic environments, particularly with significant modality loss. These methods also tend to be computationally intensive, leading to high resource consumption in real-world applications. To address these challenges, we propose the Multi Modal Mamba Enhanced Transformer (M3ET), a lightweight model designed for efficient multimodal learning, particularly on mobile platforms. By incorporating the Mamba module and a semantic-based adaptive attention mechanism, M3ET optimizes feature fusion, alignment, and modality reconstruction. Our experiments show that M3ET improves cross-task performance, with a 2.3 times increase in pretraining inference speed. In particular, the core VQA task accuracy of M3ET remains at 0.74, while the model's parameter count is reduced by 0.67. Although performance on the EQA task is limited, M3ET's lightweight design makes it well suited for deployment on resource-constrained robotic platforms.

10 Apr 2015

computer-science logic-in-computer-science

Towards A Theory Of Quantum Computability

Institut Galil´ee Universit´e Paris13 Sorbonne Paris Cite LIPN

We propose a definition of quantum computable functions as mappings between superpositions of natural numbers to probability distributions of natural numbers. Each function is obtained as a limit of an infinite computation of a quantum Turing machine. The class of quantum computable functions is recursively enumerable, thus opening the door to a quantum computability theory which may follow some of the classical developments.

121

18 Apr 2024

computer-science artificial-intelligence computation-and-language

GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction

LIPN FI Group

The GraphER model reframes information extraction as a Graph Structure Learning task, utilizing a Token Graph Transformer to explicitly model and refine the graph of entities and relations. The model achieved competitive performance on the ACE05 dataset and state-of-the-art results on the CoNLL04 and SciERC benchmarks for joint entity and relation extraction.

07 Oct 2025

analysis-of-pdes mathematics populations-and-evolution

Mechanistic-statistical inference of mosquito dynamics from mark-release-recapture data

INRAE Sorbonne University University Sorbonne Paris Nord Institute of Tropical Medicine Pedro Kour´ı (IPK)

Biological control strategies against mosquito-borne diseases--such as the sterile insect technique (SIT), RIDL, and Wolbachia-based releases--require reliable estimates of dispersal and survival of released males. We propose a mechanistic--statistical framework for mark--release--recapture (MRR) data linking an individual-based 2D diffusion model with its reaction--diffusion limit. Inference is based on solving the macroscopic system and embedding it in a Poisson observation model for daily trap counts, with uncertainty quantified via a parametric bootstrap. We validate identifiability using simulated data and apply the model to an urban MRR campaign in El Cano (Havana, Cuba) involving four weekly releases of sterile Aedes aegypti males. The best-supported model suggests a mean life expectancy of about five days and a typical displacement of about 180 m. Unlike empirical fits of survival or dispersal, our mechanistic approach jointly estimates movement, mortality, and capture, yielding biologically interpretable parameters and a principled framework for designing and evaluating SIT-based interventions.

14 Oct 2020

computer-science computation-and-language machine-learning

Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting

INRIA Paris LIPN Universit Sorbonne Paris Nord

Even though Variational Autoencoders (VAEs) are widely used for semi-supervised learning, the reason why they work remains unclear. In fact, the addition of the unsupervised objective is most often vaguely described as a regularization. The strength of this regularization is controlled by down-weighting the objective on the unlabeled part of the training set. Through an analysis of the objective of semi-supervised VAEs, we observe that they use the posterior of the learned generative model to guide the inference model in learning the partially observed latent variable. We show that given this observation, it is possible to gain finer control on the effect of the unsupervised objective on the training procedure. Using importance weighting, we derive two novel objectives that prioritize either one of the partially observed latent variable, or the unobserved latent variable. Experiments on the IMDB english sentiment analysis dataset and on the AG News topic classification dataset show the improvements brought by our prioritization mechanism and exhibit a behavior that is inline with our description of the inner working of Semi-Supervised VAEs.

24 Jul 2024

computer-science logic-in-computer-science software-engineering

Formalizing UML State Machines for Automated Verification -- A Survey

Tianjin University

CNRS

National University of Singapore

Nanyang Technological University

Inria Singapore Management University Loria UNIVERSIT́E SORBONNE PARIS NORD LIPN Universite´ de Lorraine

The Unified Modeling Language (UML) is a standard for modeling dynamic systems. UML behavioral state machines are used for modeling the dynamic behavior of object-oriented designs. The UML specification, maintained by the Object Management Group (OMG), is documented in natural language (in contrast to formal language). The inherent ambiguity of natural languages may introduce inconsistencies in the resulting state machine model. Formalizing UML state machine specification aims at solving the ambiguity problem and at providing a uniform view to software designers and developers. Such a formalization also aims at providing a foundation for automatic verification of UML state machine models, which can help to find software design vulnerabilities at an early stage and reduce the development cost. We provide here a comprehensive survey of existing work from 1997 to 2021 related to formalizing UML state machine semantics for the purpose of conducting model checking at the design stage.

14 Aug 2025

computer-science logic-in-computer-science

Repairing General Game Descriptions (extended version)

CNRS University of New South Wales Potassco Solutions LIPN Sorbonne Paris North University University of Naples “Federico II”

The Game Description Language (GDL) is a widely used formalism for specifying the rules of general games. Writing correct GDL descriptions can be challenging, especially for non-experts. Automated theorem proving has been proposed to assist game design by verifying if a GDL description satisfies desirable logical properties. However, when a description is proved to be faulty, the repair task itself can only be done manually. Motivated by the work on repairing unsolvable planning domain descriptions, we define a more general problem of finding minimal repairs for GDL descriptions that violate formal requirements, and we provide complexity results for various computational problems related to minimal repair. Moreover, we present an Answer Set Programming-based encoding for solving the minimal repair problem and demonstrate its application for automatically repairing ill-defined game descriptions.

26 Aug 2025

computer-science contrastive-learning machine-learning

Prototype-Guided Diffusion: Visual Conditioning without External Memory

Université de Versailles Saint-Quentin-en-Yvelines LIPN Universit Paris 13

Diffusion models have emerged as a leading framework for high-quality image generation, offering stable training and strong performance across diverse domains. However, they remain computationally intensive, particularly during the iterative denoising process. Latent-space models like Stable Diffusion alleviate some of this cost by operating in compressed representations, though at the expense of fine-grained detail. More recent approaches such as Retrieval-Augmented Diffusion Models (RDM) address efficiency by conditioning denoising on similar examples retrieved from large external memory banks. While effective, these methods introduce drawbacks: they require costly storage and retrieval infrastructure, depend on static vision-language models like CLIP for similarity, and lack adaptability during training. We propose the Prototype Diffusion Model (PDM), a method that integrates prototype learning directly into the diffusion process for efficient and adaptive visual conditioning - without external memory. Instead of retrieving reference samples, PDM constructs a dynamic set of compact visual prototypes from clean image features using contrastive learning. These prototypes guide the denoising steps by aligning noisy representations with semantically relevant visual patterns, enabling efficient generation with strong semantic grounding. Experiments show that PDM maintains high generation quality while reducing computational and storage overhead, offering a scalable alternative to retrieval-based conditioning in diffusion models.

24 Mar 2023

clustering-algorithms computer-science machine-learning

DBSCAN of Multi-Slice Clustering for Third-Order Tensors

CNRS Sorbonne Paris Nord University DAVID Lab LIPN University of Versailles Université Paris-Saclay

Several methods for triclustering three-dimensional data require the cluster size or the number of clusters in each dimension to be specified. To address this issue, the Multi-Slice Clustering (MSC) for 3-order tensor finds signal slices that lie in a low dimensional subspace for a rank-one tensor dataset in order to find a cluster based on the threshold similarity. We propose an extension algorithm called MSC-DBSCAN to extract the different clusters of slices that lie in the different subspaces from the data if the dataset is a sum of r rank-one tensor (r > 1). Our algorithm uses the same input as the MSC algorithm and can find the same solution for rank-one tensor data as MSC.

09 Dec 2022

computer-science formal-languages-and-automata-theory logic-in-computer-science

Efficient Convex Zone Merging in Parametric Timed Automata

CNRS Aarhus University

Inria Loria UNIVERSIT́E SORBONNE PARIS NORD LIPN Universite´ de Lorraine

Parametric timed automata are a powerful formalism for reasoning on concurrent real-time systems with unknown or uncertain timing constants. Reducing their state space is a significant way to reduce the inherently large analysis times. We present here different merging reduction techniques based on convex union of constraints (parametric zones), allowing to decrease the number of states while preserving the correctness of verification and synthesis results. We perform extensive experiments, and identify the best heuristics in practice, bringing a significant decrease in the computation time on a benchmarks library.

05 Feb 2015

computer-science operating-systems

OS-level Failure Injection with SystemTap

CNRS LIPN Universit´e Paris 13, Sorbonne Paris Cit´e

Failure injection in distributed systems has been an important issue to experiment with robust, resilient distributed systems. In order to reproduce real-life conditions, parts of the application must be killed without letting the operating system close the existing network communications in a "clean" way. When a process is simply killed, the OS closes them. SystemTap is a an infrastructure that probes the Linux kernel's internal calls. If processes are killed at kernel-level, they can be destroyed without letting the OS do anything else. In this paper, we present a kernel-level failure injection system based on SystemTap. We present how it can be used to implement deterministic and probabilistic failure scenarios.

18 Oct 2013

combinatorics mathematics quantum-algebra

Noncommutative determinants, Cauchy-Binet formulae, and Capelli-type identities II. Grassmann and quantum oscillator algebra representation

CNRS

INFN Universit`a degli Studi di Milano LIPN Universit ́e Paris Nord

We prove that, for

X

Y

A

and

B

matrices with entries in a non-commutative ring such that

[X_{ij},Y_{k\ell}]=-A_{i\ell} B_{kj}

, satisfying suitable commutation relations (in particular,

X

is a Manin matrix), the following identity holds: $ \mathrm{coldet} X \mathrm{coldet} Y = < 0 | \mathrm{coldet} (a A + X (I-a^{\dagger} B)^{-1} Y) |0 > $. Furthermore, if also

Y

is a Manin matrix, $ \mathrm{coldet} X \mathrm{coldet} Y =\int \mathcal{D}(\psi, \psi^{\dagger}) \exp [ \sum_{k \geq 0} \frac{1}{k+1} (\psi^{\dagger} A \psi)^{k} (\psi^{\dagger} X B^k Y \psi) ]

. Notations:

< 0 |

,

| 0 >$, are respectively the bra and the ket of the ground state,

a^{\dagger}

and

a

the creation and annihilation operators of a quantum harmonic oscillator, while

\psi^{\dagger}_i

and

\psi_i

are Grassmann variables in a Berezin integral. These results should be seen as a generalization of the classical Cauchy-Binet formula, in which

A

and

B

are null matrices, and of the non-commutative generalization, the Capelli identity, in which

A

and

B

are identity matrices and

[X_{ij},X_{k\ell}]=[Y_{ij},Y_{k\ell}]=0

16 Sep 2024

active-learning computer-science continual-learning

Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection

Paris-Saclay University UVSQ Sorbonne Paris Nord University UMR CNRS DAVID Lab LIPN Groupe BPCE

Real-world tabular learning production scenarios typically involve evolving data streams, where data arrives continuously and its distribution may change over time. In such a setting, most studies in the literature regarding supervised learning favor the use of instance incremental algorithms due to their ability to adapt to changes in the data distribution. Another significant reason for choosing these algorithms is \textit{avoid storing observations in memory} as commonly done in batch incremental settings. However, the design of instance incremental algorithms often assumes immediate availability of labels, which is an optimistic assumption. In many real-world scenarios, such as fraud detection or credit scoring, labels may be delayed. Consequently, batch incremental algorithms are widely used in many real-world tasks. This raises an important question: "In delayed settings, is instance incremental learning the best option regarding predictive performance and computational efficiency?" Unfortunately, this question has not been studied in depth, probably due to the scarcity of real datasets containing delayed information. In this study, we conduct a comprehensive empirical evaluation and analysis of this question using a real-world fraud detection problem and commonly used generated datasets. Our findings indicate that instance incremental learning is not the superior option, considering on one side state-of-the-art models such as Adaptive Random Forest (ARF) and other side batch learning models such as XGBoost. Additionally, when considering the interpretability of the learning systems, batch incremental solutions tend to be favored. Code: \url{this https URL}

14 Aug 2012

computer-science computational-complexity discrete-mathematics

Local Rules for Computable Planar Tilings

CNRS Université Aix Marseille LIPN LATP Universit Paris 13, Sorbonne Paris Cit

Aperiodic tilings are non-periodic tilings characterized by local constraints. They play a key role in the proof of the undecidability of the domino problem (1964) and naturally model quasicrystals (discovered in 1982). A central question is to characterize, among a class of non-periodic tilings, the aperiodic ones. In this paper, we answer this question for the well-studied class of non-periodic tilings obtained by digitizing irrational vector spaces. Namely, we prove that such tilings are aperiodic if and only if the digitized vector spaces are computable.

03 Dec 2024

physics computational-physics

A time-discontinuous elasto-plasticity formalism to simulate instantaneous plastic flow bursts

CNRS PSL University Mines Paris University Sorbonne Paris Nord

Plastic flow is conventionally treated as continuous in finite element (FE) codes, whether in isotropic, anisotropic plasticity, or crystal plasticity. This approach, derived from continuum mechanics, contradicts the intermittent nature of plasticity at the elementary scale. Understanding crystal plasticity at micro-scale opens the door to new engineering applications, such as microscale machining. In this work, a new approach is proposed to account for the intermittence of plastic deformation while remaining within the framework of continuum mechanics. We introduce a material parameter, the plastic deformation threshold, denoted as

\Delta p_{min}

, corresponding to the plastic deformation carried by the minimal plastic deformation burst within the material. The incremental model is based on the traditional predictor-corrector algorithm to calculate the elastoplastic behavior of a material subjected to any external loading. The model is presented within the framework of small deformations for von Mises plasticity. To highlight the main features of the approach, the plastic strain increment is calculated using normality rule and consistency conditions, and is accepted only if it exceeds

\Delta p_{min}

. To achieve this, a time-discontinuous generalization of the Karush-Kuhn-Tucker (KKT) conditions is proposed. The simulations show that the introduction of the plastic threshold allows for the reproduction of the spatiotemporal intermittence of plastic flow, capturing the self-organization of plastic flow in complex loading scenarios within an FE model.

01 Nov 2015

computer-science distributed-parallel-and-cluster-computing

Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance

CNRS LIPN Universit´e Paris 13, Sorbonne Paris Cit´e

Communication-avoiding algorithms allow redundant computations to minimize the number of inter-process communications. In this paper, we propose to exploit this redundancy for fault-tolerance purpose. We illustrate this idea with QR factorization of tall and skinny matrices, and we evaluate the number of failures our algorithm can tolerate under different semantics.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

MB-ORES: A Multi-Branch Object Reasoner for Visual Grounding in Remote Sensing

An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction

MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

M3ET: Efficient Vision-Language Learning for Robotics based on Multimodal Mamba-Enhanced Transformer

Towards A Theory Of Quantum Computability

GraphER: A Structure-aware Text-to-Graph Model for Entity and Relation Extraction

Mechanistic-statistical inference of mosquito dynamics from mark-release-recapture data

Controlling the Interaction Between Generation and Inference in Semi-Supervised Variational Autoencoders Using Importance Weighting

Formalizing UML State Machines for Automated Verification -- A Survey

Repairing General Game Descriptions (extended version)

Prototype-Guided Diffusion: Visual Conditioning without External Memory

DBSCAN of Multi-Slice Clustering for Third-Order Tensors

Efficient Convex Zone Merging in Parametric Timed Automata

OS-level Failure Injection with SystemTap

Noncommutative determinants, Cauchy-Binet formulae, and Capelli-type identities II. Grassmann and quantum oscillator algebra representation

Evaluating the Efficacy of Instance Incremental vs. Batch Learning in Delayed Label Environments: An Empirical Study on Tabular Data Streaming for Fraud Detection

Local Rules for Computable Planar Tilings

A time-discontinuous elasto-plasticity formalism to simulate instantaneous plastic flow bursts

Exploiting Redundant Computation in Communication-Avoiding Algorithms for Algorithm-Based Fault Tolerance

Events

AI for Law

Personalize Your Feed