Qingdao University
NeurCross presents the first self-supervised neural network for generating cross fields for quad mesh generation, leveraging an optimizable neural Signed Distance Function (SDF) to robustly derive implicit principal curvature guidance. This joint optimization approach yields high-quality quadrilateral meshes, outperforming state-of-the-art methods on ShapeNet and Thingi10K datasets in terms of area distortion, angle distortion, Chamfer distance, and Jacobian ratio, while also exhibiting robustness to noise.
9
This study systematically investigates how different electron velocity distribution shapes influence bremsstrahlung radiation power in fusion plasmas. The work confirms that for electron-ion bremsstrahlung, the effect of distribution shape is relatively modest, typically causing less than 10% deviation from Maxwellian results, largely validating a key assumption in fusion plasma models; however, it also shows that anisotropic distributions can significantly reduce electron-electron bremsstrahlung, which becomes dominant at higher temperatures.
Vision-Language Models (VLMs) based on Mixture-of-Experts (MoE) architectures have emerged as a pivotal paradigm in multimodal understanding, offering a powerful framework for integrating visual and linguistic information. However, the increasing complexity and diversity of tasks present significant challenges in coordinating load balancing across heterogeneous visual experts, where optimizing one specialist's performance often compromises others' capabilities. To address task heterogeneity and expert load imbalance, we propose Astrea, a novel multi-expert collaborative VLM architecture based on progressive pre-alignment. Astrea introduces three key innovations: 1) A heterogeneous expert coordination mechanism that integrates four specialized models (detection, segmentation, classification, captioning) into a comprehensive expert matrix covering essential visual comprehension elements; 2) A dynamic knowledge fusion strategy featuring progressive pre-alignment to harmonize experts within the VLM latent space through contrastive learning, complemented by probabilistically activated stochastic residual connections to preserve knowledge continuity; 3) An enhanced optimization framework utilizing momentum contrastive learning for long-range dependency modeling and adaptive weight allocators for real-time expert contribution calibration. Extensive evaluations across 12 benchmark tasks spanning VQA, image captioning, and cross-modal retrieval demonstrate Astrea's superiority over state-of-the-art models, achieving an average performance gain of +4.7\%. This study provides the first empirical demonstration that progressive pre-alignment strategies enable VLMs to overcome task heterogeneity limitations, establishing new methodological foundations for developing general-purpose multimodal agents.
Medical image super-resolution (SR) is essential for enhancing diagnostic accuracy while reducing acquisition cost and scanning time. However, modeling both long-range anatomical structures and fine-grained frequency details with low computational overhead remains challenging. We propose FGMamba, a novel frequency-aware gated state-space model that unifies global dependency modeling and fine-detail enhancement into a lightweight architecture. Our method introduces two key innovations: a Gated Attention-enhanced State-Space Module (GASM) that integrates efficient state-space modeling with dual-branch spatial and channel attention, and a Pyramid Frequency Fusion Module (PFFM) that captures high-frequency details across multiple resolutions via FFT-guided fusion. Extensive evaluations across five medical imaging modalities (Ultrasound, OCT, MRI, CT, and Endoscopic) demonstrate that FGMamba achieves superior PSNR/SSIM while maintaining a compact parameter footprint (<0.75M), outperforming CNN-based and Transformer-based SOTAs. Our results validate the effectiveness of frequency-aware state-space modeling for scalable and accurate medical image enhancement.
Multimodal Knowledge Graph Completion (MMKGC) aims to address the critical issue of missing knowledge in multimodal knowledge graphs (MMKGs) for their better applications. However, both the previous MMGKC and negative sampling (NS) approaches ignore the employment of multimodal information to generate diverse and high-quality negative triples from various semantic levels and hardness levels, thereby limiting the effectiveness of training MMKGC models. Thus, we propose a novel Diffusion-based Hierarchical Negative Sampling (DHNS) scheme tailored for MMKGC tasks, which tackles the challenge of generating high-quality negative triples by leveraging a Diffusion-based Hierarchical Embedding Generation (DiffHEG) that progressively conditions on entities and relations as well as multimodal semantics. Furthermore, we develop a Negative Triple-Adaptive Training (NTAT) strategy that dynamically adjusts training margins associated with the hardness level of the synthesized negative triples, facilitating a more robust and effective learning procedure to distinguish between positive and negative triples. Extensive experiments on three MMKGC benchmark datasets demonstrate that our framework outperforms several state-of-the-art MMKGC models and negative sampling techniques, illustrating the effectiveness of our DHNS for training MMKGC models. The source codes and datasets of this paper are available at this https URL.
6
In surveillance scenarios, varying camera distances cause significant differences among pedestrian image resolutions, making it hard to match low-resolution (LR) images with high-resolution (HR) counterparts, limiting the performance of Re-Identification (ReID) tasks. Most existing Cross-Resolution ReID (CR-ReID) methods rely on super-resolution (SR) or joint learning for feature compensation, which increases training and inference complexity and has reached a performance bottleneck in recent studies. Inspired by semantic directions in the word embedding space, we empirically discover that semantic directions implying resolution differences also emerge in the feature space of ReID, and we substantiate this finding from a statistical perspective using Canonical Correlation Analysis and Pearson Correlation Analysis. Based on this interesting finding, we propose a lightweight and effective Vector Panning Feature Alignment (VPFA) framework, which conducts CR-ReID from a novel perspective of modeling the resolution-specific feature discrepancy. Extensive experimental results on multiple CR-ReID benchmarks show that our method significantly outperforms previous state-of-the-art baseline models while obtaining higher efficiency, demonstrating the effectiveness and superiority of our model based on the new finding in this paper.
3
Background Major depressive disorder (MDD) is a leading cause of global disability, yet current diagnostic approaches often rely on subjective assessments and lack the ability to integrate multimodal clinical information. Large language models (LLMs) hold promise for enhancing diagnostic accuracy through advanced reasoning but face challenges in interpretability, hallucination, and reliance on synthetic data. Methods We developed MDD-Thinker, an LLM-based diagnostic framework that integrates supervised fine-tuning (SFT) with reinforcement learning (RL) to strengthen reasoning ability and interpretability. Using the UK Biobank dataset, we generated 40,000 reasoning samples, supplemented with 10,000 samples from publicly available mental health datasets. The model was fine-tuned on these reasoning corpora, and its diagnostic and reasoning performance was evaluated against machine learning, deep learning, and state-of-the-art LLM baselines. Findings MDD-Thinker achieved an accuracy of 0.8268 and F1-score of 0.8081, significantly outperforming traditional baselines such as SVM and MLP, as well as general-purpose LLMs. Incorporating both SFT and RL yielded the greatest improvements, with relative gains of 29.0% in accuracy, 38.1% in F1-score, and 34.8% in AUC. Moreover, the model demonstrated comparable reasoning performance compared to much larger LLMs, while maintaining computational efficiency. Interpretation This study presents the first reasoning-enhanced LLM framework for MDD diagnosis trained on large-scale real-world clinical data. By integrating SFT and RL, MDD-Thinker balances accuracy, interpretability, and efficiency, offering a scalable approach for intelligent psychiatric diagnostics. These findings suggest that reasoning-oriented LLMs can provide clinically reliable support for MDD detection and may inform broader applications in mental health care.
Modern energy applications, especially electric vehicles, demand high energy batteries. However, despite decades of intensive efforts, the highest energy density and commercially viable batteries are still based on LiCoO2, the very first generation of cathode materials. The technical bottleneck is the stability of oxide-based cathodes at high operating voltages. The fundamental puzzle is that we actually never understood the redox mechanism of LiCoO2. Conventional wisdom generally defines redox to be centered on cations at low voltages, and on anions, i.e. oxygen, at high voltages by forming oxidized chemical states like O2 or peroxo-species. Here, through in-situ and ex-situ spectroscopy coupled with theoretical calculations, we show that high-energy layered cathodes, represented by LiCoO2 and LiNiO2, operate through enhancement of negative charge transfer (NCT) ground states upon charging throughout the whole voltage range - i.e., NCT evolution itself is the intrinsic redox mechanism regardless of voltage ranges. NCT inherently engages high covalency and oxygen holes, leading to optimized performance without conventional redox centers in LiCoO2. The level of NCT, i.e., number of ligand holes, naturally explains many seemingly controversial results. The redefinition of redox mechanism reveals the pathway toward viable high energy battery electrodes.
Synthetic Aperture Radar (SAR), with its all- weather and wide-area observation capabilities, serves as a crucial tool for wake detection. However, due to its complex imaging mechanism, wake features in SAR images often appear abstract and noisy, posing challenges for accurate annotation. In contrast, optical images provide more distinct visual cues, but models trained on optical data suffer from performance degradation when applied to SAR images due to domain shift. To address this cross-modal domain adaptation challenge, we propose a Similarity-Guided and Memory-Guided Domain Adap- tation (termed SimMemDA) framework for unsupervised domain adaptive ship wake detection via instance-level feature similarity filtering and feature memory guidance. Specifically, to alleviate the visual discrepancy between optical and SAR images, we first utilize WakeGAN to perform style transfer on optical images, generating pseudo-images close to the SAR style. Then, instance-level feature similarity filtering mechanism is designed to identify and prioritize source samples with target-like dis- tributions, minimizing negative transfer. Meanwhile, a Feature- Confidence Memory Bank combined with a K-nearest neighbor confidence-weighted fusion strategy is introduced to dynamically calibrate pseudo-labels in the target domain, improving the reliability and stability of pseudo-labels. Finally, the framework further enhances generalization through region-mixed training, strategically combining source annotations with calibrated tar- get pseudo-labels. Experimental results demonstrate that the proposed SimMemDA method can improve the accuracy and robustness of cross-modal ship wake detection tasks, validating the effectiveness and feasibility of the proposed method.
Underwater images often suffer from various issues such as low brightness, color shift, blurred details, and noise due to light absorption and scattering caused by water and suspended particles. Previous underwater image enhancement (UIE) methods have primarily focused on spatial domain enhancement, neglecting the frequency domain information inherent in the images. However, the degradation factors of underwater images are closely intertwined in the spatial domain. Although certain methods focus on enhancing images in the frequency domain, they overlook the inherent relationship between the image degradation factors and the information present in the frequency domain. As a result, these methods frequently enhance certain attributes of the improved image while inadequately addressing or even exacerbating other attributes. Moreover, many existing methods heavily rely on prior knowledge to address color shift problems in underwater images, limiting their flexibility and robustness. In order to overcome these limitations, we propose the Embedding Frequency and Dual Color Encoder Network (FDCE-Net) in our paper. The FDCE-Net consists of two main structures: (1) Frequency Spatial Network (FS-Net) aims to achieve initial enhancement by utilizing our designed Frequency Spatial Residual Block (FSRB) to decouple image degradation factors in the frequency domain and enhance different attributes separately. (2) To tackle the color shift issue, we introduce the Dual-Color Encoder (DCE). The DCE establishes correlations between color and semantic representations through cross-attention and leverages multi-scale image features to guide the optimization of adaptive color query. The final enhanced images are generated by combining the outputs of FS-Net and DCE through a fusion network. These images exhibit rich details, clear textures, low noise and natural colors.
3
Quantum Dempster-Shafer Theory (QDST) uses quantum interference effects to derive a quantum mass function (QMF) as a fuzzy metric type from information obtained from various data sources. In addition, QDST uses quantum parallel computing to speed up computation. Nevertheless, the effective management of conflicts between multiple QMFs in QDST is a challenging question. This work aims to address this problem by proposing a Quantum Conflict Indicator (QCI) that measures the conflict between two QMFs in decision-making. Then, the properties of the QCI are carefully investigated. The obtained results validate its compliance with desirable conflict measurement properties such as non-negativity, symmetry, boundedness, extreme consistency and insensitivity to refinement. We then apply the proposed QCI in conflict fusion methods and compare its performance with several commonly used fusion approaches. This comparison demonstrates the superiority of the QCI-based conflict fusion method. Moreover, the Class Description Domain Space (C-DDS) and its optimized version, C-DDS+ by utilizing the QCI-based fusion method, are proposed to address the Out-of-Distribution (OOD) detection task. The experimental results show that the proposed approach gives better OOD performance with respect to several state-of-the-art baseline OOD detection methods. Specifically, it achieves an average increase in Area Under the Receiver Operating Characteristic Curve (AUC) of 1.2% and a corresponding average decrease in False Positive Rate at 95% True Negative Rate (FPR95) of 5.4% compared to the optimal baseline method.
While contrastive multi-view clustering has achieved remarkable success, it implicitly assumes balanced class distribution. However, real-world multi-view data primarily exhibits class imbalance distribution. Consequently, existing methods suffer performance degradation due to their inability to perceive and model such imbalance. To address this challenge, we present the first systematic study of imbalanced multi-view clustering, focusing on two fundamental problems: i. perceiving class imbalance distribution, and ii. mitigating representation degradation of minority samples. We propose PROTOCOL, a novel PaRtial Optimal TranspOrt-enhanced COntrastive Learning framework for imbalanced multi-view clustering. First, for class imbalance perception, we map multi-view features into a consensus space and reformulate the imbalanced clustering as a partial optimal transport (POT) problem, augmented with progressive mass constraints and weighted KL divergence for class distributions. Second, we develop a POT-enhanced class-rebalanced contrastive learning at both feature and class levels, incorporating logit adjustment and class-sensitive learning to enhance minority sample representations. Extensive experiments demonstrate that PROTOCOL significantly improves clustering performance on imbalanced multi-view data, filling a critical research gap in this field.
Learning from ambiguous labels is a long-standing problem in practical machine learning applications. The purpose of \emph{partial label learning} (PLL) is to identify the ground-truth label from a set of candidate labels associated with a given instance. Inspired by the remarkable performance of diffusion models in various generation tasks, this paper explores their potential to denoise ambiguous labels through the reverse denoising process. Therefore, this paper reformulates the label disambiguation problem from the perspective of generative models, where labels are generated by iteratively refining initial random guesses. This perspective enables the diffusion model to learn how label information is generated stochastically. By modeling the generation uncertainty, we can use the maximum likelihood estimate of the label for classification inference. However, such ambiguous labels lead to a mismatch between instance and label, which reduces the quality of generated data. To address this issue, this paper proposes a \emph{diffusion disambiguation model for PLL} (DDMP), which first uses the potential complementary information between instances and labels to construct pseudo-clean labels for initial diffusion training. Furthermore, a transition-aware matrix is introduced to estimate the potential ground-truth labels, which are dynamically updated during the diffusion generation. During training, the ground-truth label is progressively refined, improving the classifier. Experiments show the advantage of the DDMP and its suitability for PLL.
As a special type of collinear antiferromagnetism (AFM), altermagnetism has garnered significant research interest recently. Altermagnets exhibit broken parity-time symmetry and zero net magnetization in real space, leading to substantial band splitting in momentum space even in the absence of spin-orbit coupling. Meanwhile, parity-time symmetry breaking always induce nontrivial band topology such as Weyl nodes. While Weyl semimetal states and nodal lines have been theoretically proposed in altermagnets, rare reports of experimental observation have been made up to this point. Using ARPES and first-principles calculations, we systematically studied the electronic structure of the room-temperature altermagnet candidate CrSb. At generic locations in momentum space, we clearly observed band spin splitting. Furthermore, we identified discrete surface Fermi arcs on the (100) cleaved side surface close to the Fermi level originating from bulk band topology. Our results imply that CrSb contains interesting nontrivial topological Weyl physics, in addition to being an excellent room temperature altermagnet.
In the transfer-based adversarial attacks, adversarial examples are only generated by the surrogate models and achieve effective perturbation in the victim models. Although considerable efforts have been developed on improving the transferability of adversarial examples generated by transfer-based adversarial attacks, our investigation found that, the big deviation between the actual and steepest update directions of the current transfer-based adversarial attacks is caused by the large update step length, resulting in the generated adversarial examples can not converge well. However, directly reducing the update step length will lead to serious update oscillation so that the generated adversarial examples also can not achieve great transferability to the victim models. To address these issues, a novel transfer-based attack, namely direction tuning attack, is proposed to not only decrease the update deviation in the large step length, but also mitigate the update oscillation in the small sampling step length, thereby making the generated adversarial examples converge well to achieve great transferability on victim models. In addition, a network pruning method is proposed to smooth the decision boundary, thereby further decreasing the update oscillation and enhancing the transferability of the generated adversarial examples. The experiment results on ImageNet demonstrate that the average attack success rate (ASR) of the adversarial examples generated by our method can be improved from 87.9\% to 94.5\% on five victim models without defenses, and from 69.1\% to 76.2\% on eight advanced defense methods, in comparison with that of latest gradient-based attacks.
Crystalline symmetries determine the linear and nonlinear response of materials to external stimuli such as mechanical pressure and electromagnetic fields, governing phenomena such as piezoelectricity, optical activity, and multiple wave mixing with wide ranging technological applications. Altermagnets present a new class of materials with magnetic crystalline order where specific crystal symmetry operations connect antiferromagnetic sublattices, leading to non-relativistic spin-splitting of the electronic band structure. Hence, the electric material properties of altermagnets should uniquely mirror these underlying symmetry properties, potentially giving rise to novel phenomena in response to external driving fields. Here, we report the discovery of a broadband third-order nonlinear anomalous Hall effect in altermagnetic CrSb at room temperature. The comparison of our observations with symmetry analyses and model calculations shows that this nonlinear Hall response is induced by the nonlinear electric susceptibility of a Berry curvature quadrupole, which exists within the spin-split band structure of CrSb and is characterized by the underlying crystalline and magnetic symmetries. We then utilize this third-order nonlinear electric susceptibility of CrSb to realize a multiple wave mixing device with pronounced four wave mixing output, which could, in principle, be extended to THz frequencies. Our study discovers that the crystalline magnetic order of altermagnets determines their nonlinear electric material properties, which could facilitate applications in high-frequency electronics, THz generation, communication networks, and energy harvesting.
Sleep staging is a key method for assessing sleep quality and diagnosing sleep disorders. However, current deep learning methods face challenges: 1) postfusion techniques ignore the varying contributions of different modalities; 2) unprocessed sleep data can interfere with frequency-domain information. To tackle these issues, this paper proposes a gated multimodal temporal neural network for multidomain sleep data, including heart rate, motion, steps, EEG (Fpz-Cz, Pz-Oz), and EOG from WristHR-Motion-Sleep and SleepEDF-78. The model integrates: 1) a pre-processing module for feature alignment, missing value handling, and EEG de-trending; 2) a feature extraction module for complex sleep features in the time dimension; and 3) a dynamic fusion module for real-time modality weighting.Experiments show classification accuracies of 85.03% on SleepEDF-78 and 94.54% on WristHR-Motion-Sleep datasets. The model handles heterogeneous datasets and outperforms state-of-the-art models by 1.00%-4.00%.
MGCR-Net introduces a multimodal graph-conditioned vision-language reconstruction network to enhance remote sensing change detection by fusing visual and linguistic features. The framework leverages multimodal large language models to generate semantic cues and achieves state-of-the-art accuracy on four public datasets.
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management. Despite the remarkable progress of deep learning in recent years, most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions. We observe that frequency-domain feature modeling particularly in the wavelet domain an amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain. Thus, we propose a method called Wavelet-Guided Dual-Frequency Encoding (WGDF). Specifically, we first apply Discrete Wavelet Transform (DWT) to decompose the input images into high-frequency and low-frequency components, which are used to model local details and global structures, respectively. In the high-frequency branch, we design a Dual-Frequency Feature Enhancement (DFFE) module to strengthen edge detail representation and introduce a Frequency-Domain Interactive Difference (FDID) module to enhance the modeling of fine-grained changes. In the low-frequency branch, we exploit Transformers to capture global semantic relationships and employ a Progressive Contextual Difference Module (PCDM) to progressively refine change regions, enabling precise structural semantic characterization. Finally, the high- and low-frequency features are synergistically fused to unify local sensitivity with global discriminability. Extensive experiments on multiple remote sensing datasets demonstrate that WGDF significantly alleviates edge ambiguity and achieves superior detection accuracy and robustness compared to state-of-the-art methods. The code will be available at this https URL.
An end-to-end pipeline generates realistic, textured 3D faces from skull scans by combining biological profile-guided initial face generation with anatomy-guided adaptation based on learned tissue depth distributions. The method achieves a mean reconstruction error of 1.526% on a test dataset, outperforming previous methods and allowing interactive exploration of plausible facial appearances.
18
There are no more papers matching your filters at the moment.