alphaXiv

History

Papers Benchmarks

University of Tehran

2,143

03 Dec 2024

ai-for-health computer-science artificial-intelligence

Segmentation of Coronary Artery Stenosis in X-ray Angiography using Mamba Models

University of Tehran

Researchers from the University of Tehran developed Mamba-based deep learning models for automated segmentation of coronary artery stenosis in X-ray angiography images. The U-Mamba BOT model achieved an F1-score of 68.79%, improving upon prior semi-supervised approaches by 11.8% and significantly outperforming a Transformer-based baseline, while a lightweight variant maintained efficiency.

2,027

05 Dec 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

CNN-based Labelled Crack Detection for Image Annotation

Florida International University Islamic Azad University University of Tehran University of Isfahan

A CNN-based system accurately detects and classifies surface cracks and other defects in Additive Manufacturing components, achieving over 99% accuracy across various defect types. The approach leverages a custom-annotated dataset created with LabelImg and demonstrates robust performance for industrial quality control.

5,178

02 Jun 2025

computer-science artificial-intelligence computation-and-language

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation

Sharif University of Technology Qatar Computing Research Institute University of Tehran K.N. Toosi University of Technology

A comprehensive survey systematically reviews over 100 recent papers in Multimodal Retrieval-Augmented Generation (RAG), proposing an innovation-driven taxonomy that categorizes methods across retrieval, fusion, augmentation, generation, and training strategies, and outlines open challenges and future research directions.

384

160

30 Aug 2025

computer-science robotics electrical-engineering

Gray-Box Computed Torque Control for Differential-Drive Mobile Robot Tracking

University of Tehran

This study presents a learning-based nonlinear algorithm for tracking control of differential-drive mobile robots. The Computed Torque Method (CTM) suffers from inaccurate knowledge of system parameters, while Deep Reinforcement Learning (DRL) algorithms are known for sample inefficiency and weak stability guarantees. The proposed method replaces the black-box policy network of a DRL agent with a gray-box Computed Torque Controller (CTC) to improve sample efficiency and ensure closed-loop stability. This approach enables finding an optimal set of controller parameters for an arbitrary reward function using only a few short learning episodes. The Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm is used for this purpose. Additionally, some controller parameters are constrained to lie within known value ranges, ensuring the RL agent learns physically plausible values. A technique is also applied to enforce a critically damped closed-loop time response. The controller's performance is evaluated on a differential-drive mobile robot simulated in the MuJoCo physics engine and compared against the raw CTC and a conventional kinematic controller.

12 Sep 2025

computer-science computation-and-language data-curation

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Iran University of Science and Technology University of Tehran

Supervised Fine-Tuning (SFT) is essential for training large language models (LLMs), significantly enhancing critical capabilities such as instruction following and in-context learning. Nevertheless, creating suitable training datasets tailored for specific domains remains challenging due to unique domain constraints and data scarcity. In this paper, we propose SearchInstruct, an innovative method explicitly designed to construct high quality instruction datasets for SFT. Our approach begins with a limited set of domain specific, human generated questions, which are systematically expanded using a large language model. Subsequently, domain relevant resources are dynamically retrieved to generate accurate and contextually appropriate answers for each augmented question. Experimental evaluation demonstrates that SearchInstruct enhances both the diversity and quality of SFT datasets, leading to measurable improvements in LLM performance within specialized domains. Additionally, we show that beyond dataset generation, the proposed method can also effectively facilitate tasks such as model editing, enabling efficient updates to existing models. To facilitate reproducibility and community adoption, we provide full implementation details, the complete set of generated instruction response pairs, and the source code in a publicly accessible Git repository: [this https URL](this https URL)

982

07 Mar 2025

causal-inference computer-science computation-and-language

Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models

University of Tehran Khatam University

Yadollah Yaghoobzadeh

Sepehr Kamahi

A novel protocol evaluates the faithfulness of attribution methods in autoregressive language models by using counterfactual generation to create fluent, in-distribution inputs. The approach reliably assesses how accurately attribution methods identify crucial tokens that influence model predictions, particularly for off-the-shelf models that are highly sensitive to out-of-distribution inputs.

25 Jun 2025

computer-science computer-vision-and-pattern-recognition image-and-video-processing

WaRA: Wavelet Low Rank Adaptation

University of British Columbia Iran University of Science and Technology University of Tehran

Parameter-efficient fine-tuning (PEFT) has gained widespread adoption across various applications. Among PEFT techniques, Low-Rank Adaptation (LoRA) and its extensions have emerged as particularly effective, allowing efficient model adaptation while significantly reducing computational overhead. However, existing approaches typically rely on global low-rank factorizations, which overlook local or multi-scale structure, failing to capture complex patterns in the weight updates. To address this, we propose WaRA, a novel PEFT method that leverages wavelet transforms to decompose the weight update matrix into a multi-resolution representation. By performing low-rank factorization in the wavelet domain and reconstructing updates through an inverse transform, WaRA obtains compressed adaptation parameters that harness multi-resolution analysis, enabling it to capture both coarse and fine-grained features while providing greater flexibility and sparser representations than standard LoRA. Through comprehensive experiments and analysis, we demonstrate that WaRA performs superior on diverse vision tasks, including image generation, classification, and semantic segmentation, significantly enhancing generated image quality while reducing computational complexity. Although WaRA was primarily designed for vision tasks, we further showcase its effectiveness in language tasks, highlighting its broader applicability and generalizability. The code is publicly available at \href{GitHub}{this https URL}.

03 Oct 2025

adversarial-attacks adversarial-robustness computer-science

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

University of Tehran

Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work, we propose Confidence-Aware Weighting (CAW) to enhance zero-shot robustness in vision-language models. CAW consists of two components: (1) a Confidence-Aware loss that prioritizes uncertain adversarial examples by scaling the KL divergence between clean and adversarial predictions, and (2) a feature alignment regularization that preserves semantic consistency by minimizing the distance between frozen and fine-tuned image encoder features on adversarial inputs. These components work jointly to improve both clean and robust accuracy without sacrificing generalization. Extensive experiments on TinyImageNet and 14 additional datasets show that CAW outperforms recent methods such as PMG-AFT and TGA-ZSR under strong attacks like AutoAttack, while using less memory.

02 Jan 2025

high-energy-astrophysical-phenomena high-energy-physics-phenomenology nuclear-theory

Impact of QCD sum rules coupling constants on neutron stars structure

Centro Brasileiro de Pesquisas Físicas University of Tehran Do gu s University

We present a detailed investigation on the structure of neutron stars, incorporating the presence of hyperons within a relativistic model under the mean-field approximation. Employing coupling constants derived from QCD sum rules, we explore the particle fraction in beta equilibrium and establish the mass-radius relationship for neutron stars with hyperonic matter. Additionally, we compute the stellar Love number (

\mathcal{K}_{2}

) and the tidal deformability parameter (

\varLambda

), providing valuable insights into the dynamical properties of these celestial objects. Through comparison with theoretical predictions and observational data, our results exhibit good agreement, affirming the validity of our approach. These findings contribute significantly to refining the understanding of neutron star physics, particularly in environments containing hyperons, and offer essential constraints on the equation of state governing such extreme astrophysical conditions.

14 Oct 2025

computer-science artificial-intelligence human-computer-interaction

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

University of Tehran

Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for text-to-speech(TTS) applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary optimization method for precise audio-text alignment, and audio-text quality assessment frameworks tailored to Persian. The pipeline processes 2,000 audiobooks, yielding 3,526 hours of clean speech, which was further filtered into a 1,804-hour high-quality subset suitable for TTS, featuring more than 470 speakers. To validate the dataset, we fine-tuned XTTS for Persian, achieving a naturalness Mean Opinion Score (MOS) of 3.6/5 and a Speaker Similarity Mean Opinion Score (SMOS) of 4.0/5 demonstrating ParsVoice's effectiveness for training multi-speaker TTS systems. ParsVoice is the largest high-quality Persian speech dataset, offering speaker diversity and audio quality comparable to major English corpora. The complete dataset has been made publicly available to accelerate the development of Persian speech technologies. The ParsVoice dataset is publicly available at: this https URL.

17 Sep 2025

ai-for-health computer-science machine-learning

Multimodal signal fusion for stress detection using deep neural networks: a novel approach for converting 1D signals to unified 2D images

University of Tehran California Polytechnic State University

This study introduces a novel method that transforms multimodal physiological signalsphotoplethysmography (PPG), galvanic skin response (GSR), and acceleration (ACC) into 2D image matrices to enhance stress detection using convolutional neural networks (CNNs). Unlike traditional approaches that process these signals separately or rely on fixed encodings, our technique fuses them into structured image representations that enable CNNs to capture temporal and cross signal dependencies more effectively. This image based transformation not only improves interpretability but also serves as a robust form of data augmentation. To further enhance generalization and model robustness, we systematically reorganize the fused signals into multiple formats, combining them in a multi stage training pipeline. This approach significantly boosts classification performance. While demonstrated here in the context of stress detection, the proposed method is broadly applicable to any domain involving multimodal physiological signals, paving the way for more accurate, personalized, and real time health monitoring through wearable technologies.

106

20 Jan 2019

computer-science artificial-intelligence neural-and-evolutionary-computing

Deep Learning in Spiking Neural Networks

CNRS

Monash University University of Tehran University of Louisiana at Lafayette Université de Toulouse 3

This review synthesizes advancements in deep learning applied to spiking neural networks (SNNs), demonstrating that SNNs are increasingly achieving performance comparable to traditional deep neural networks on tasks like image recognition while maintaining superior energy efficiency and hardware friendliness through methods like ANN-to-SNN conversion and direct training approaches.

1,449

16 Sep 2025

computer-science artificial-intelligence mathematics

Exact alternative optima for nonlinear optimization problems defined with maximum component objective function constrained by the Sugeno-Weber fuzzy relational inequalities

University of Tehran

In this paper, we study a latticized optimization problem with fuzzy relational inequality constraints where the feasible region is formed as the intersection of two inequality fuzzy systems and Sugeno-Weber family of t-norms is considered as fuzzy composition. Sugeno-Weber family of t-norms and t-conorms is one of the most applied one in various fuzzy modelling problems. This family of t-norms and t-conorms was suggested by Weber for modeling intersection and union of fuzzy sets. Also, the t-conorms were suggested as addition rules by Sugeno for so-called alpha-fuzzy measures. The resolution of the feasible region of the problem is firstly investigated when it is defined with max-Sugeno-Weber composition and a necessary and sufficient condition is presented for determining the feasibility. Then, based on some theoretical properties of the problem, an algorithm is presented for solving this nonlinear problem. It is proved that the algorithm can find the exact optimal solution and an example is presented to illustrate the proposed algorithm.

09 Sep 2025

active-learning ai-for-health computer-science

HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment

University of British Columbia University of Tehran

Video-based assessment of motor symptoms in conditions such as Parkinson's disease (PD) offers a scalable alternative to in-clinic evaluations, but home-recorded videos introduce significant challenges, including visual degradation, inconsistent task execution, annotation noise, and domain shifts. We present HiLWS, a cascaded human-in-the-loop weak supervision framework for curating and annotating hand motor task videos from both clinical and home settings. Unlike conventional single-stage weak supervision methods, HiLWS employs a novel cascaded approach, first applies weak supervision to aggregate expert-provided annotations into probabilistic labels, which are then used to train machine learning models. Model predictions, combined with expert input, are subsequently refined through a second stage of weak supervision. The complete pipeline includes quality filtering, optimized pose estimation, and task-specific segment extraction, complemented by context-sensitive evaluation metrics that assess both visual fidelity and clinical relevance by prioritizing ambiguous cases for expert review. Our findings reveal key failure modes in home recorded data and emphasize the importance of context-sensitive curation strategies for robust medical video analysis.

03 Oct 2025

general-relativity-and-quantum-cosmology high-energy-physics-phenomenology high-energy-physics-theory

Mass Varying Neutrino Oscillation in Scalar-Gauss-Bonnet Gravity

University of Tehran

We investigate how matter density affects neutrino oscillations by considering a mass-varying neutrino scenario where the neutrino mass depends on a scalar field. This scalar field is non-minimally coupled to the Gauss-Bonnet (GB) invariant, causing its profile to be implicitly influenced by the surrounding matter distribution. Using data from solar neutrino experiments, we derive constraints on the model parameters, providing new insights into the properties of mass-varying neutrino within the Gauss-Bonnet scalar-tensor framework.

09 May 2025

computer-science computation-and-language information-retrieval

From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling

University of Tehran Institute for Research in Fundamental Sciences (IPM)

Researchers at the University of Tehran developed LLM-based methods for transforming millions of Persian political tweets into interpretable and adaptable user profiles. Their approach, combining semi-supervised filtering with abstractive and extractive profiling, achieved a Macro-F1 score of up to 0.67 for profile quality, surpassing traditional methods.

18 Nov 2025

computer-science artificial-intelligence machine-learning

MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers

University of Tehran

Music editing has emerged as an important and practical area of artificial intelligence, with applications ranging from video game and film music production to personalizing existing tracks according to user preferences. However, existing models face significant limitations, such as being restricted to editing synthesized music generated by their own models, requiring highly precise prompts, or necessitating task-specific retraining, thus lacking true zero-shot capability. leveraging recent advances in rectified flow and diffusion transformers, we introduce MusRec, a zero-shot text-to-music editing model capable of performing diverse editing tasks on real-world music efficiently and effectively. Experimental results demonstrate that our approach outperforms existing methods in preserving musical content, structural consistency, and editing fidelity, establishing a strong foundation for controllable music editing in real-world scenarios.

11 Nov 2025

computer-science computation-and-language data-curation

Isolating Culture Neurons in Multilingual Large Language Models

University of Southern Denmark University of Tehran

Researchers from the University of Tehran and the University of Southern Denmark identified "pure culture-specific neurons" in multilingual large language models, finding that 56.7% of cultural representations are encoded independently of language and can be selectively modulated with minimal cross-cultural interference.

10 Oct 2025

computer-science machine-learning ensemble-methods

Interpretable Machine Learning for Predicting Startup Funding, Patenting, and Exits

University of Tehran Drexel University

An interpretable machine learning framework unifies financial and innovation factors to predict key startup outcomes: subsequent funding, patenting activity, and eventual exits. This framework, developed by Drexel University and University of Tehran researchers, leverages leakage-safe methodologies to provide robust forecasts and actionable insights into the drivers of entrepreneurial success.

12 Sep 2025

computer-science machine-learning differential-privacy

FedRP: A Communication-Efficient Approach for Differentially Private Federated Learning Using Random Projection

University of Tehran

Federated learning (FL) offers an innovative paradigm for collaborative model training across decentralized devices, such as smartphones, balancing enhanced predictive performance with the protection of user privacy in sensitive areas like Internet of Things (IoT) and medical data analysis. Despite its advantages, FL encounters significant challenges related to user privacy protection against potential attacks and the management of communication costs. This paper introduces a novel federated learning algorithm called FedRP, which integrates random projection techniques with the Alternating Direction Method of Multipliers (ADMM) optimization framework. This approach enhances privacy by employing random projection to reduce the dimensionality of model parameters prior to their transmission to a central server, reducing the communication cost. The proposed algorithm offers a strong

(\epsilon, \delta)

-differential privacy guarantee, demonstrating resilience against data reconstruction attacks. Experimental results reveal that FedRP not only maintains high model accuracy but also outperforms existing methods, including conventional differential privacy approaches and FedADMM, in terms of both privacy preservation and communication efficiency.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Segmentation of Coronary Artery Stenosis in X-ray Angiography using Mamba Models

CNN-based Labelled Crack Detection for Image Annotation

Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation

Gray-Box Computed Torque Control for Differential-Drive Mobile Robot Tracking

SearchInstruct: Enhancing Domain Adaptation via Retrieval-Based Instruction Dataset Creation

Counterfactuals As a Means for Evaluating Faithfulness of Attribution Methods in Autoregressive Language Models

WaRA: Wavelet Low Rank Adaptation

Zero-Shot Robustness of Vision Language Models Via Confidence-Aware Weighting

Impact of QCD sum rules coupling constants on neutron stars structure

ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis

Multimodal signal fusion for stress detection using deep neural networks: a novel approach for converting 1D signals to unified 2D images

Deep Learning in Spiking Neural Networks

Exact alternative optima for nonlinear optimization problems defined with maximum component objective function constrained by the Sugeno-Weber fuzzy relational inequalities

HiLWS: A Human-in-the-Loop Weak Supervision Framework for Curating Clinical and Home Video Data for Neurological Assessment

Mass Varying Neutrino Oscillation in Scalar-Gauss-Bonnet Gravity

From Millions of Tweets to Actionable Insights: Leveraging LLMs for User Profiling

MusRec: Zero-Shot Text-to-Music Editing via Rectified Flow and Diffusion Transformers

Isolating Culture Neurons in Multilingual Large Language Models

Interpretable Machine Learning for Predicting Startup Funding, Patenting, and Exits

FedRP: A Communication-Efficient Approach for Differentially Private Federated Learning Using Random Projection

Events

AI for Law

Personalize Your Feed