Xiangtan University
Transformer-based networks have achieved strong performance in low-level vision tasks like image deraining by utilizing spatial or channel-wise self-attention. However, irregular rain patterns and complex geometric overlaps challenge single-paradigm architectures, necessitating a unified framework to integrate complementary global-local and spatial-channel representations. To address this, we propose a novel Cross Paradigm Representation and Alignment Transformer (CPRAformer). Its core idea is the hierarchical representation and alignment, leveraging the strengths of both paradigms (spatial-channel and global-local) to aid image reconstruction. It bridges the gap within and between paradigms, aligning and coordinating them to enable deep interaction and fusion of features. Specifically, we use two types of self-attention in the Transformer blocks: sparse prompt channel self-attention (SPC-SA) and spatial pixel refinement self-attention (SPR-SA). SPC-SA enhances global channel dependencies through dynamic sparsity, while SPR-SA focuses on spatial rain distribution and fine-grained texture recovery. To address the feature misalignment and knowledge differences between them, we introduce the Adaptive Alignment Frequency Module (AAFM), which aligns and interacts with features in a two-stage progressive manner, enabling adaptive guidance and complementarity. This reduces the information gap within and between paradigms. Through this unified cross-paradigm dynamic interaction framework, we achieve the extraction of the most valuable interactive fusion information from the two paradigms. Extensive experiments demonstrate that our model achieves state-of-the-art performance on eight benchmark datasets and further validates CPRAformer's robustness in other image restoration tasks and downstream applications.
23
A dual-task learning framework called ChemDual leverages large language models for chemical reaction and retrosynthesis prediction by combining a 4.4M molecule synthetic dataset with correlated forward/backward task training, achieving state-of-the-art performance on USPTO-50K and ChemLLMBench while generating chemically valid compounds with strong target binding affinity.
Inverse problems involving partial differential equations (PDEs) with discontinuous coefficients are fundamental challenges in modeling complex spatiotemporal systems with heterogeneous structures and uncertain dynamics. Traditional numerical and machine learning approaches often face limitations in addressing these problems due to high dimensionality, inherent nonlinearity, and discontinuous parameter spaces. In this work, we propose a novel computational framework that synergistically integrates physics-informed deep learning with Bayesian inference for accurate parameter identification in PDEs with jump discontinuities in coefficients. The core innovation of our framework lies in a dual-network architecture employing a gradient-adaptive weighting strategy: a main network approximates PDE solutions while a sub network samples its coefficients. To effectively identify mixture structures in parameter spaces, we employ Markovian dynamics methods to capture hidden state transitions of complex spatiotemporal systems. The framework has applications in reconstruction of solutions and identification of parameter-varying regions. Comprehensive numerical experiments on various PDEs with jump-varying coefficients demonstrate the framework's exceptional adaptability, accuracy, and robustness compared to existing methods. This study provides a generalizable computational approach of parameter identification for PDEs with discontinuous parameter structures, particularly in non-stationary or heterogeneous systems.
We derive an exact formula for the probability that a Brownian path on an annulus does not disconnect the two boundary components of the annulus. The leading asymptotic behavior of this probability is governed by the disconnection exponent obtained by Lawler-Schramm-Werner (2001) using the connection to Schramm-Loewner evolution (SLE). The derivation of our formula is based on this connection and the coupling with Liouville quantum gravity (LQG). As byproducts of our proof, we obtain a precise relation between Brownian motion on a disk stopped upon hitting the boundary and the SLE8/3_{8/3} loop measure on the disk; we also obtain a detailed description of the LQG surfaces cut by the outer boundary of stopped Brownian motion on a 8/3\sqrt{8/3}-LQG disk.
The discovery of flat-bands in magic-angle twisted bilayer graphene has underscored the potential of moire engineering for correlated states, but such phases are notoriously difficult to realize and highly fragile against perturbations. Here, we propose an alternative route to flat-bands by introducing sp3 hybridization in twisted graphite. Instead of relying on fine-tuned magic angles, our approach identifies flat-band states at relatively large twist angles with short moire periods. In this regime, sp3-induced reconstructions generate electronic states that, once formed, are locked by substantial energy barriers, rendering them robust against external perturbations. Using twisted graphite as a prototype, we uncover a series moire-diamond that host two-dimensional flat conduction of valence bands, where carriers are localized within specific momentum planes but remain dispersive along orthogonal directions. The emergence of dimensional flat-bands opens a new platform for flat-band-driven correlated physics and suggests opportunities for designing quantum materials with highly directional electronic functionalities.
RadioLAM presents a Large AI Model designed for fine-grained 3D radio map estimation, inferring high-resolution maps across varying altitudes from ultra-sparse sensor data. The system demonstrates robust performance with sampling rates as low as 0.1%, significantly outperforming existing 3D mapping techniques.
Concept erasure aims to remove harmful, inappropriate, or copyrighted content from text-to-image diffusion models while preserving non-target semantics. However, existing methods either rely on costly fine-tuning or apply coarse semantic separation, often degrading unrelated concepts and lacking adaptability to evolving concept sets. To alleviate this issue, we propose Graph-Guided Online Concept Erasure (GrOCE), a training-free framework that performs precise and adaptive concept removal through graph-based semantic reasoning. GrOCE models concepts and their interrelations as a dynamic semantic graph, enabling principled reasoning over dependencies and fine-grained isolation of undesired content. It comprises three components: (1) Dynamic Topological Graph Construction for incremental graph building, (2) Adaptive Cluster Identification for multi-hop traversal with similarity-decay scoring, and (3) Selective Edge Severing for targeted edge removal while preserving global semantics. Extensive experiments demonstrate that GrOCE achieves state-of-the-art performance on Concept Similarity (CS) and Fréchet Inception Distance (FID) metrics, offering efficient, accurate, and stable concept erasure without retraining.
The pursuit of materials combining light constituent elements with ultralow lattice thermal conductivity (κL\kappa_{\mathrm{L}}) is crucial to advancing technologies like thermoelectrics and thermal barrier coatings, yet it remains a formidable challenge to date. Herein, we achieve ultralow κL\kappa_{\mathrm{L}} in lightweight cyanide-bridged framework materials (CFMs) through the rational integration of properties such as the hierarchical vibrations exhibited in superatomic structures and rotational dynamics exhibited in perovskites. Unique hierarchical rotation behavior leads to multiple negative peaks in Grüneisen parameters across a wide frequency range, thereby inducing pronounced negative thermal expansion and strong cubic anharmonicity in CFMs. Meanwhile, the synergistic effect between large four-phonon scattering phase space (induced by phonon quasi-flat bands and wide bandgaps) and strong quartic anharmonicity (associated with rotation modes) leads to giant quartic anharmonic scattering rates in these materials. Consequently, the κL\kappa_{\mathrm{L}} of these CFMs decreases by one to two orders of magnitude compared to the known perovskites or perovskite-like materials with equivalent average atomic masses. For instance, the Cd(CN)2_{2}, NaB(CN)4_{4}, LiIn(CN)4_{4}, and AgX(CN)4_{4} (X = B, Al, Ga, In) exhibit ultralow room-temperature κL\kappa_{\mathrm{L}} values ranging from 0.35 to 0.81 W/mK. This work not only establishes CFMs as a novel and rich platform for studying extreme phonon anharmonicity, but also provides a new paradigm for achieving ultralow thermal conductivity in lightweight materials via the conscious integration of hierarchical and rotational dynamics.
Large Language Models (LLMs) have shown promise in assisting molecular property prediction tasks but often rely on human-crafted prompts and chain-of-thought templates. While recent advanced large reasoning models like DeepSeek-R1 employ reinforcement learning for an extended ``thinking'' process, their reasoning can be verbose and lack relevance. We introduce AttriLens-Mol, an attribute-guided reinforcement learning framework for molecular property prediction with LLMs. AttriLens-Mol steers the model's reasoning by using: (1) a format reward encouraging attribute-based structured output, (2) a count reward to avoid enumerating irrelevant attributes, and (3) a rationality reward using advanced LLMs and RDKit to verify the relatedness of the generated attributes. This approach implicitly elicits the model's inherent knowledge of relevant molecular attributes during reasoning, enables making predictions for the molecular property more effectively. Experiments on both in-distribution and out-of-distribution datasets show that, training both 7B-size R1-Distilled-Qwen2.5 and R1-Distilled-LLaMA3.1 models on 4,000 samples with our proposed AttriLens-Mol method significantly boosts the performance, getting comparable or better results than supervised fine-tuning models (Mol-Instructions, ChemDFM, etc.) and advanced models (GPT-3.5, GPT-4o, DeepSeek-V3, DeepSeek-R1, etc.). Further, our extracted attributes for the target property, when used as features for an interpretable decision tree model, yield superior performance compared to attributes generated by prompting LLMs. This shows that AttriLens-Mol effectively elicits more relevant and predictive molecular attributes, leading to enhanced interpretability and performance for property prediction. We release the code in this https URL.
A generic knowledge distillation framework, HeteroAKD, enables effective knowledge transfer between heterogeneous deep learning architectures for semantic segmentation tasks. It consistently outperforms state-of-the-art methods, achieving up to a 3.37% mIoU gain on Cityscapes and allowing student models to sometimes exceed their teacher's performance.
Researchers developed PENet, a framework for few-shot 3D point cloud semantic segmentation that expands prototype capacity by integrating features from a conventional learner and a re-purposed diffusion model encoder. The method, which includes a novel iterative assimilation module, achieved 70.04% mIoU on S3DIS for 2-way 1-shot learning, outperforming prior state-of-the-art by 3.63%.
1
Inductive Knowledge Graph Completion (KGC) aims to infer missing facts between newly emerged entities within knowledge graphs (KGs), posing a significant challenge. While recent studies have shown promising results in inferring such entities through knowledge subgraph reasoning, they suffer from (i) the semantic inconsistencies of similar relations, and (ii) noisy interactions inherent in KGs due to the presence of unconvincing knowledge for emerging entities. To address these challenges, we propose a Semantic Structure-aware Denoising Network (S2^2DN) for inductive KGC. Our goal is to learn adaptable general semantics and reliable structures to distill consistent semantic knowledge while preserving reliable interactions within KGs. Specifically, we introduce a semantic smoothing module over the enclosing subgraphs to retain the universal semantic knowledge of relations. We incorporate a structure refining module to filter out unreliable interactions and offer additional knowledge, retaining robust structure surrounding target links. Extensive experiments conducted on three benchmark KGs demonstrate that S2^2DN surpasses the performance of state-of-the-art models. These results demonstrate the effectiveness of S2^2DN in preserving semantic consistency and enhancing the robustness of filtering out unreliable interactions in contaminated KGs.
Face recognition has been an active and vital topic among computer vision community for a long time. Previous researches mainly focus on loss functions used for facial feature extraction network, among which the improvements of softmax-based loss functions greatly promote the performance of face recognition. However, the contradiction between the drastically increasing number of face identities and the shortage of GPU memories is gradually becoming irreconcilable. In this paper, we thoroughly analyze the optimization goal of softmax-based loss functions and the difficulty of training massive identities. We find that the importance of negative classes in softmax function in face representation learning is not as high as we previously thought. The experiment demonstrates no loss of accuracy when training with only 10\% randomly sampled classes for the softmax-based loss functions, compared with training with full classes using state-of-the-art models on mainstream benchmarks. We also implement a very efficient distributed sampling algorithm, taking into account model accuracy and training efficiency, which uses only eight NVIDIA RTX2080Ti to complete classification tasks with tens of millions of identities. The code of this paper has been made available this https URL.
26,783
The Schrödingerization method converts linear partial and ordinary differential equations with non-unitary dynamics into systems of Schrödinger-type equations with unitary evolution. It does so via the so-called warped phase transformation that maps the original equation into a Schrödinger-type equation in one higher dimension \cite{Schrshort,JLY22SchrLong}. The original proposal used a particular initial function in the auxiliary space that did not achieve optimal scaling in precision. Here we show that, by choosing smoother initial functions in auxiliary space, Schrödingerization \textit{can} in fact achieve near optimal and even optimal scaling in matrix queries. We construct three necessary criteria that the initial auxiliary state must satisfy to achieve optimality. This paper presents detailed implementation of four smooth initializations for the Schrödingerization method: (a) the error function and related functions, (b) the cut-off function, (c) the higher-order polynomial interpolation, and (d) Fourier transform methods. Method (a) achieves optimality and methods (b), (c) and (d) can achieve near-optimality. A detailed analysis of key parameters affecting time complexity is conducted.
In recent years, significant developments have been made in both video retrieval and video moment retrieval tasks, which respectively retrieve complete videos or moments for a given text query. These advancements have greatly improved user satisfaction during the search process. However, previous work has failed to establish meaningful "interaction" between the retrieval system and the user, and its one-way retrieval paradigm can no longer fully meet the personalization and dynamic needs of at least 80.8\% of users. In this paper, we introduce the Interactive Video Corpus Retrieval (IVCR) task, a more realistic setting that enables multi-turn, conversational, and realistic interactions between the user and the retrieval system. To facilitate research on this challenging task, we introduce IVCR-200K, a high-quality, bilingual, multi-turn, conversational, and abstract semantic dataset that supports video retrieval and even moment retrieval. Furthermore, we propose a comprehensive framework based on multi-modal large language models (MLLMs) to help users interact in several modes with more explainable solutions. The extensive experiments demonstrate the effectiveness of our dataset and framework.
4
We introduce an NP-complete graph decision problem, the "Multi-stage graph Simple Path" (abbr. MSP) problem, which focuses on determining the existence of specific "global paths" in a graph GG. We show that the MSP problem can be solved in polynomial (O(E9)O(|E|^9)) time, by proposing a polynomial-time graph algorithm and the proof of its correctness. Our result implies NP==P. The algorithm leverages the data structure of reachable-path edge-set R(e)R(e). By establishing the interplay between preceding decisions and subsequent decisions, the information computed for R(e)R(e) (in a monotonically decreasing manner) carries all necessary contextual information, and can be utilized to summarize the "history" and to detect the "future" for searching "global paths". The relation of R(e)R(e) of different stages in the multi-stage graph resembles the state-transition equation in dynamic programming, though it is much more convoluted. To avoid exponential complexity, paths are always treated as a collection of edge sets. Our proof of the algorithm is built upon a mathematical induction - based proving framework, which relies on a crucial structural property of the MSP problem: all MSP instances are arranged into the sequence {G0,G1,G2,...G_0,G_1,G_2,...}, and each G_{j}(j>0) in the sequence must have some $G_{i}(0\leq i
Chaotic behavior arises from very simple non-linear dynamical equation of logistic map which makes it was used often in designing chaotic image encryption schemes. However, some properties of chaotic maps can also facilitate cryptanalysis especially when they are implemented in digital domain. Utilizing stable distribution of the chaotic states generated by iterating the logistic map, this paper presents a typical example to show insecurity of an image encryption scheme using chaotic logistic map. This work will push encryption and chaos be combined in a more effective way.
The Landau-Brazovskii model provides a theoretical framework for describing various phases arising from competing short- and long-range interactions in many physical systems. In this work, we investigate phase transitions among various ordered phases within the three-dimensional Landau-Brazovskii model. We construct the phase diagram of this model, which encompasses eight distinct phases, and systematically compute the transition pathways connecting various metastable and stable states using the Landau-Brazovskii saddle dynamics. Along each transition pathway, the critical nucleus is identified with some detailed analyses of its shape, energy barrier, and Hessian eigenvalues. Furthermore, we explore how the transition state is influenced by model parameters, revealing systematic trends in critical nucleus sizes and energy barrier heights. Our results provide a comprehensive characterization of the nucleation mechanisms within the Landau-Brazovskii model and offer valuable insights into the structural transformations of modulated-phase systems.
02 May 2025
We develop a quantum algorithm for solving high-dimensional fractional Poisson equations. By applying the Caffarelli-Silvestre extension, the dd-dimensional fractional equation is reformulated as a local partial differential equation in d+1d+1 dimensions. We propose a quantum algorithm for the finite element discretization of this local problem, by capturing the steady-state of the corresponding differential equations using the Schr\"odingerization approach from \cite{JLY22SchrShort, JLY22SchrLong, analogPDE}. The Schr\"odingerization technique transforms general linear partial and ordinary differential equations into Schr\"odinger-type systems, making them suitable for quantum simulation. This is achieved through the warped phase transformation, which maps the equation into a higher-dimensional space. We provide detailed implementations of the method and conduct a comprehensive complexity analysis, which can show up to exponential advantage -- with respect to the inverse of the mesh size in high dimensions -- compared to its classical counterpart. Specifically, while the classical method requires O~(d1/233d/2hd2)\widetilde{\mathcal{O}}(d^{1/2} 3^{3d/2} h^{-d-2}) operations, the quantum counterpart requires O~(d33d/2h2.5)\widetilde{\mathcal{O}}(d 3^{3d/2} h^{-2.5}) queries to the block-encoding input models, with the quantum complexity being independent of the dimension dd in terms of the inverse mesh size h1h^{-1}. Numerical experiments are conducted to verify the validity of our formulation.
Large language models (LLMs) are widely applied in various natural language processing tasks such as question answering and machine translation. However, due to the lack of labeled data and the difficulty of manual annotation for biochemical properties, the performance for molecule generation tasks is still limited, especially for tasks involving multi-properties constraints. In this work, we present a two-step framework PEIT (Property Enhanced Instruction Tuning) to improve LLMs for molecular-related tasks. In the first step, we use textual descriptions, SMILES, and biochemical properties as multimodal inputs to pre-train a model called PEIT-GEN, by aligning multi-modal representations to synthesize instruction data. In the second step, we fine-tune existing open-source LLMs with the synthesized data, the resulting PEIT-LLM can handle molecule captioning, text-based molecule generation, molecular property prediction, and our newly proposed multi-constraint molecule generation tasks. Experimental results show that our pre-trained PEIT-GEN outperforms MolT5 and BioT5 in molecule captioning, demonstrating modalities align well between textual descriptions, structures, and biochemical properties. Furthermore, PEIT-LLM shows promising improvements in multi-task molecule generation, proving the scalability of the PEIT framework for various molecular tasks. We release the code, constructed instruction data, and model checkpoints in this https URL
There are no more papers matching your filters at the moment.