University of Tehran
Researchers from the University of Tehran developed Mamba-based deep learning models for automated segmentation of coronary artery stenosis in X-ray angiography images. The U-Mamba BOT model achieved an F1-score of 68.79%, improving upon prior semi-supervised approaches by 11.8% and significantly outperforming a Transformer-based baseline, while a lightweight variant maintained efficiency.
A CNN-based system accurately detects and classifies surface cracks and other defects in Additive Manufacturing components, achieving over 99% accuracy across various defect types. The approach leverages a custom-annotated dataset created with LabelImg and demonstrates robust performance for industrial quality control.
A comprehensive survey systematically reviews over 100 recent papers in Multimodal Retrieval-Augmented Generation (RAG), proposing an innovation-driven taxonomy that categorizes methods across retrieval, fusion, augmentation, generation, and training strategies, and outlines open challenges and future research directions.
384
This study presents a learning-based nonlinear algorithm for tracking control of differential-drive mobile robots. The Computed Torque Method (CTM) suffers from inaccurate knowledge of system parameters, while Deep Reinforcement Learning (DRL) algorithms are known for sample inefficiency and weak stability guarantees. The proposed method replaces the black-box policy network of a DRL agent with a gray-box Computed Torque Controller (CTC) to improve sample efficiency and ensure closed-loop stability. This approach enables finding an optimal set of controller parameters for an arbitrary reward function using only a few short learning episodes. The Twin-Delayed Deep Deterministic Policy Gradient (TD3) algorithm is used for this purpose. Additionally, some controller parameters are constrained to lie within known value ranges, ensuring the RL agent learns physically plausible values. A technique is also applied to enforce a critically damped closed-loop time response. The controller's performance is evaluated on a differential-drive mobile robot simulated in the MuJoCo physics engine and compared against the raw CTC and a conventional kinematic controller.
Supervised Fine-Tuning (SFT) is essential for training large language models (LLMs), significantly enhancing critical capabilities such as instruction following and in-context learning. Nevertheless, creating suitable training datasets tailored for specific domains remains challenging due to unique domain constraints and data scarcity. In this paper, we propose SearchInstruct, an innovative method explicitly designed to construct high quality instruction datasets for SFT. Our approach begins with a limited set of domain specific, human generated questions, which are systematically expanded using a large language model. Subsequently, domain relevant resources are dynamically retrieved to generate accurate and contextually appropriate answers for each augmented question. Experimental evaluation demonstrates that SearchInstruct enhances both the diversity and quality of SFT datasets, leading to measurable improvements in LLM performance within specialized domains. Additionally, we show that beyond dataset generation, the proposed method can also effectively facilitate tasks such as model editing, enabling efficient updates to existing models. To facilitate reproducibility and community adoption, we provide full implementation details, the complete set of generated instruction response pairs, and the source code in a publicly accessible Git repository: [this https URL](this https URL)
12
·
A novel protocol evaluates the faithfulness of attribution methods in autoregressive language models by using counterfactual generation to create fluent, in-distribution inputs. The approach reliably assesses how accurately attribution methods identify crucial tokens that influence model predictions, particularly for off-the-shelf models that are highly sensitive to out-of-distribution inputs.
2
Parameter-efficient fine-tuning (PEFT) has gained widespread adoption across various applications. Among PEFT techniques, Low-Rank Adaptation (LoRA) and its extensions have emerged as particularly effective, allowing efficient model adaptation while significantly reducing computational overhead. However, existing approaches typically rely on global low-rank factorizations, which overlook local or multi-scale structure, failing to capture complex patterns in the weight updates. To address this, we propose WaRA, a novel PEFT method that leverages wavelet transforms to decompose the weight update matrix into a multi-resolution representation. By performing low-rank factorization in the wavelet domain and reconstructing updates through an inverse transform, WaRA obtains compressed adaptation parameters that harness multi-resolution analysis, enabling it to capture both coarse and fine-grained features while providing greater flexibility and sparser representations than standard LoRA. Through comprehensive experiments and analysis, we demonstrate that WaRA performs superior on diverse vision tasks, including image generation, classification, and semantic segmentation, significantly enhancing generated image quality while reducing computational complexity. Although WaRA was primarily designed for vision tasks, we further showcase its effectiveness in language tasks, highlighting its broader applicability and generalizability. The code is publicly available at \href{GitHub}{this https URL}.
Vision-language models like CLIP demonstrate impressive zero-shot generalization but remain highly vulnerable to adversarial attacks. In this work, we propose Confidence-Aware Weighting (CAW) to enhance zero-shot robustness in vision-language models. CAW consists of two components: (1) a Confidence-Aware loss that prioritizes uncertain adversarial examples by scaling the KL divergence between clean and adversarial predictions, and (2) a feature alignment regularization that preserves semantic consistency by minimizing the distance between frozen and fine-tuned image encoder features on adversarial inputs. These components work jointly to improve both clean and robust accuracy without sacrificing generalization. Extensive experiments on TinyImageNet and 14 additional datasets show that CAW outperforms recent methods such as PMG-AFT and TGA-ZSR under strong attacks like AutoAttack, while using less memory.
We present a detailed investigation on the structure of neutron stars, incorporating the presence of hyperons within a relativistic model under the mean-field approximation. Employing coupling constants derived from QCD sum rules, we explore the particle fraction in beta equilibrium and establish the mass-radius relationship for neutron stars with hyperonic matter. Additionally, we compute the stellar Love number (K2\mathcal{K}_{2}) and the tidal deformability parameter (Λ\varLambda), providing valuable insights into the dynamical properties of these celestial objects. Through comparison with theoretical predictions and observational data, our results exhibit good agreement, affirming the validity of our approach. These findings contribute significantly to refining the understanding of neutron star physics, particularly in environments containing hyperons, and offer essential constraints on the equation of state governing such extreme astrophysical conditions.
Existing Persian speech datasets are typically smaller than their English counterparts, which creates a key limitation for developing Persian speech technologies. We address this gap by introducing ParsVoice, the largest Persian speech corpus designed specifically for text-to-speech(TTS) applications. We created an automated pipeline that transforms raw audiobook content into TTS-ready data, incorporating components such as a BERT-based sentence completion detector, a binary search boundary optimization method for precise audio-text alignment, and audio-text quality assessment frameworks tailored to Persian. The pipeline processes 2,000 audiobooks, yielding 3,526 hours of clean speech, which was further filtered into a 1,804-hour high-quality subset suitable for TTS, featuring more than 470 speakers. To validate the dataset, we fine-tuned XTTS for Persian, achieving a naturalness Mean Opinion Score (MOS) of 3.6/5 and a Speaker Similarity Mean Opinion Score (SMOS) of 4.0/5 demonstrating ParsVoice's effectiveness for training multi-speaker TTS systems. ParsVoice is the largest high-quality Persian speech dataset, offering speaker diversity and audio quality comparable to major English corpora. The complete dataset has been made publicly available to accelerate the development of Persian speech technologies. The ParsVoice dataset is publicly available at: this https URL.
This study introduces a novel method that transforms multimodal physiological signalsphotoplethysmography (PPG), galvanic skin response (GSR), and acceleration (ACC) into 2D image matrices to enhance stress detection using convolutional neural networks (CNNs). Unlike traditional approaches that process these signals separately or rely on fixed encodings, our technique fuses them into structured image representations that enable CNNs to capture temporal and cross signal dependencies more effectively. This image based transformation not only improves interpretability but also serves as a robust form of data augmentation. To further enhance generalization and model robustness, we systematically reorganize the fused signals into multiple formats, combining them in a multi stage training pipeline. This approach significantly boosts classification performance. While demonstrated here in the context of stress detection, the proposed method is broadly applicable to any domain involving multimodal physiological signals, paving the way for more accurate, personalized, and real time health monitoring through wearable technologies.
This review synthesizes advancements in deep learning applied to spiking neural networks (SNNs), demonstrating that SNNs are increasingly achieving performance comparable to traditional deep neural networks on tasks like image recognition while maintaining superior energy efficiency and hardware friendliness through methods like ANN-to-SNN conversion and direct training approaches.
1,449
In this paper, we study a latticized optimization problem with fuzzy relational inequality constraints where the feasible region is formed as the intersection of two inequality fuzzy systems and Sugeno-Weber family of t-norms is considered as fuzzy composition. Sugeno-Weber family of t-norms and t-conorms is one of the most applied one in various fuzzy modelling problems. This family of t-norms and t-conorms was suggested by Weber for modeling intersection and union of fuzzy sets. Also, the t-conorms were suggested as addition rules by Sugeno for so-called alpha-fuzzy measures. The resolution of the feasible region of the problem is firstly investigated when it is defined with max-Sugeno-Weber composition and a necessary and sufficient condition is presented for determining the feasibility. Then, based on some theoretical properties of the problem, an algorithm is presented for solving this nonlinear problem. It is proved that the algorithm can find the exact optimal solution and an example is presented to illustrate the proposed algorithm.
Video-based assessment of motor symptoms in conditions such as Parkinson's disease (PD) offers a scalable alternative to in-clinic evaluations, but home-recorded videos introduce significant challenges, including visual degradation, inconsistent task execution, annotation noise, and domain shifts. We present HiLWS, a cascaded human-in-the-loop weak supervision framework for curating and annotating hand motor task videos from both clinical and home settings. Unlike conventional single-stage weak supervision methods, HiLWS employs a novel cascaded approach, first applies weak supervision to aggregate expert-provided annotations into probabilistic labels, which are then used to train machine learning models. Model predictions, combined with expert input, are subsequently refined through a second stage of weak supervision. The complete pipeline includes quality filtering, optimized pose estimation, and task-specific segment extraction, complemented by context-sensitive evaluation metrics that assess both visual fidelity and clinical relevance by prioritizing ambiguous cases for expert review. Our findings reveal key failure modes in home recorded data and emphasize the importance of context-sensitive curation strategies for robust medical video analysis.
We investigate how matter density affects neutrino oscillations by considering a mass-varying neutrino scenario where the neutrino mass depends on a scalar field. This scalar field is non-minimally coupled to the Gauss-Bonnet (GB) invariant, causing its profile to be implicitly influenced by the surrounding matter distribution. Using data from solar neutrino experiments, we derive constraints on the model parameters, providing new insights into the properties of mass-varying neutrino within the Gauss-Bonnet scalar-tensor framework.
Researchers at the University of Tehran developed LLM-based methods for transforming millions of Persian political tweets into interpretable and adaptable user profiles. Their approach, combining semi-supervised filtering with abstractive and extractive profiling, achieved a Macro-F1 score of up to 0.67 for profile quality, surpassing traditional methods.
Music editing has emerged as an important and practical area of artificial intelligence, with applications ranging from video game and film music production to personalizing existing tracks according to user preferences. However, existing models face significant limitations, such as being restricted to editing synthesized music generated by their own models, requiring highly precise prompts, or necessitating task-specific retraining, thus lacking true zero-shot capability. leveraging recent advances in rectified flow and diffusion transformers, we introduce MusRec, a zero-shot text-to-music editing model capable of performing diverse editing tasks on real-world music efficiently and effectively. Experimental results demonstrate that our approach outperforms existing methods in preserving musical content, structural consistency, and editing fidelity, establishing a strong foundation for controllable music editing in real-world scenarios.
Researchers from the University of Tehran and the University of Southern Denmark identified "pure culture-specific neurons" in multilingual large language models, finding that 56.7% of cultural representations are encoded independently of language and can be selectively modulated with minimal cross-cultural interference.
2
An interpretable machine learning framework unifies financial and innovation factors to predict key startup outcomes: subsequent funding, patenting activity, and eventual exits. This framework, developed by Drexel University and University of Tehran researchers, leverages leakage-safe methodologies to provide robust forecasts and actionable insights into the drivers of entrepreneurial success.
Federated learning (FL) offers an innovative paradigm for collaborative model training across decentralized devices, such as smartphones, balancing enhanced predictive performance with the protection of user privacy in sensitive areas like Internet of Things (IoT) and medical data analysis. Despite its advantages, FL encounters significant challenges related to user privacy protection against potential attacks and the management of communication costs. This paper introduces a novel federated learning algorithm called FedRP, which integrates random projection techniques with the Alternating Direction Method of Multipliers (ADMM) optimization framework. This approach enhances privacy by employing random projection to reduce the dimensionality of model parameters prior to their transmission to a central server, reducing the communication cost. The proposed algorithm offers a strong (ϵ,δ)(\epsilon, \delta)-differential privacy guarantee, demonstrating resilience against data reconstruction attacks. Experimental results reveal that FedRP not only maintains high model accuracy but also outperforms existing methods, including conventional differential privacy approaches and FedADMM, in terms of both privacy preservation and communication efficiency.
There are no more papers matching your filters at the moment.