Friedrich-Alexander University Erlangen-Nuremberg
A self-supervised framework called Filter2Noise enables interpretable denoising of low-dose CT images through attention-guided bilateral filtering, achieving up to 4.59 dB PSNR improvement over existing methods while requiring only 3.6k parameters and operating on single images without clean training data.
To manage the complexity of transformers in video compression, local attention mechanisms are a practical necessity. The common approach of partitioning frames into patches, however, creates architectural flaws like irregular receptive fields. When adapted for temporal autoregressive models, this paradigm, exemplified by the Video Compression Transformer (VCT), also necessitates computationally redundant overlapping windows. This work introduces 3D Sliding Window Attention (SWA), a patchless form of local attention. By enabling a decoder-only architecture that unifies spatial and temporal context processing, and by providing a uniform receptive field, our method significantly improves rate-distortion performance, achieving Bjørntegaard Delta-rate savings of up to 18.6 % against the VCT baseline. Simultaneously, by eliminating the need for overlapping windows, our method reduces overall decoder complexity by a factor of 2.8, while its entropy model is nearly 3.5 times more efficient. We further analyze our model's behavior and show that while it benefits from long-range temporal context, excessive context can degrade performance.
Sensitive attributes like gender or age can lead to unfair predictions in machine learning tasks such as predictive business process monitoring, particularly when used without considering context. We present FairLoop1, a tool for human-guided bias mitigation in neural network-based prediction models. FairLoop distills decision trees from neural networks, allowing users to inspect and modify unfair decision logic, which is then used to fine-tune the original model towards fairer predictions. Compared to other approaches to fairness, FairLoop enables context-aware bias removal through human involvement, addressing the influence of sensitive attributes selectively rather than excluding them uniformly.
In myoelectric control, simultaneous control of multiple degrees of freedom can be challenging due to the dexterity of the human hand. Numerous studies have focused on hand functionality, however, they only focused on a few degrees of freedom. In this paper, a 3DCNN-MLP model is proposed that uses high-density sEMG signals to estimate 20 hand joint positions and grip force simultaneously. The deep learning model maps the muscle activity to the hand kinematics and kinetics. The proposed models' performance is also evaluated in estimating grip forces with real-time resolution. This paper investigated three individual dynamic hand movements (2pinch, 3pinch, and fist closing and opening) while applying forces in 10% and 30% of the maximum voluntary contraction (MVC). The results demonstrated significant accuracy in estimating kinetics and kinematics. The average Euclidean distance across all joints and subjects was 11.01 ±\pm 2.22 mm and the mean absolute error for offline and real-time force estimation were found to be 0.8 ±\pm 0.33 N and 2.09 ±\pm 0.9 N respectively. The results demonstrate that by leveraging high-density sEMG and deep learning, it is possible to estimate human hand dynamics (kinematics and kinetics), which is a step forward to practical prosthetic hands.
The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture segmentation by benchmarking state-of-the-art algorithms on these complex tasks. A diverse dataset of 150 CT scans was collected from multiple clinical centers, and a large set of simulated X-ray images was generated using the DeepDRR method. Final submissions from 16 teams worldwide were evaluated under a rigorous multi-metric testing scheme. The top-performing CT algorithm achieved an average fragment-wise intersection over union (IoU) of 0.930, demonstrating satisfactory accuracy. However, in the X-ray task, the best algorithm attained an IoU of 0.774, highlighting the greater challenges posed by overlapping anatomical structures. Beyond the quantitative evaluation, the challenge revealed methodological diversity in algorithm design. Variations in instance representation, such as primary-secondary classification versus boundary-core separation, led to differing segmentation strategies. Despite promising results, the challenge also exposed inherent uncertainties in fragment definition, particularly in cases of incomplete fractures. These findings suggest that interactive segmentation approaches, integrating human decision-making with task-relevant information, may be essential for improving model reliability and clinical applicability.
·
In this study, we introduce a Fourier series-based trainable filter for computed tomography (CT) reconstruction within the filtered backprojection (FBP) framework. This method overcomes the limitation in noise reduction by optimizing Fourier series coefficients to construct the filter, maintaining computational efficiency with minimal increment for the trainable parameters compared to other deep learning frameworks. Additionally, we propose Gaussian edge-enhanced (GEE) loss function that prioritizes the L1L_1 norm of high-frequency magnitudes, effectively countering the blurring problems prevalent in mean squared error (MSE) approaches. The model's foundation in the FBP algorithm ensures excellent interpretability, as it relies on a data-driven filter with all other parameters derived through rigorous mathematical procedures. Designed as a plug-and-play solution, our Fourier series-based filter can be easily integrated into existing CT reconstruction models, making it an adaptable tool for a wide range of practical applications. Code and data are available at this https URL.
4
We present seg2med, a modular framework for anatomy-driven multimodal medical image synthesis. The system integrates three components to enable high-fidelity, cross-modality generation of CT and MR images based on structured anatomical priors. First, anatomical maps are independently derived from three sources: real patient data, XCAT digital phantoms, and synthetic anatomies created by combining organs from multiple patients. Second, we introduce PhysioSynth, a modality-specific simulator that converts anatomical masks into prior volumes using tissue-dependent parameters (e.g., HU, T1, T2, proton density) and modality-specific signal models. It supports simulation of CT and multiple MR sequences including GRE, SPACE, and VIBE. Third, the synthesized anatomical priors are used to train 2-channel conditional denoising diffusion models, which take the anatomical prior as structural condition alongside the noisy image, enabling generation of high-quality, structurally aligned images. The framework achieves SSIM of 0.94 for CT and 0.89 for MR compared to real data, and FSIM of 0.78 for simulated CT. The generative quality is further supported by a Frechet Inception Distance (FID) of 3.62 for CT synthesis. In modality conversion, seg2med achieves SSIM of 0.91 for MR to CT and 0.77 for CT to MR. Anatomical fidelity evaluation shows synthetic CT achieves mean Dice scores above 0.90 for 11 key abdominal organs, and above 0.80 for 34 of 59 total organs. These results underscore seg2med's utility in cross-modality synthesis, data augmentation, and anatomy-aware medical AI.
High-resolution spectroscopy allows one to probe weak interactions and to detect subtle phenomena. While such measurements are routinely performed on atoms and molecules in the gas phase, spectroscopy of adsorbed species on surfaces is faced with challenges. As a result, previous studies of surface-adsorbed molecules have fallen short of the ultimate resolution, where the transition linewidth is determined by the lifetime of the excited state. In this work, we conceive a new approach to surface deposition and report on Fourier-limited electronic transitions in single dibenzoterrylenes adsorbed onto the surface of an anthracene crystal. By performing spectroscopy and super-resolution microscopy at liquid helium temperature, we shed light on various properties of the adsorbed molecules. Our experimental results pave the way for a new class of experiments in surface science, where high spatial and spectral resolution can be combined.
Predictive process monitoring enables organizations to proactively react and intervene in running instances of a business process. Given an incomplete process instance, predictions about the outcome, next activity, or remaining time are created. This is done by powerful machine learning models, which have shown impressive predictive performance. However, the data-driven nature of these models makes them susceptible to finding unfair, biased, or unethical patterns in the data. Such patterns lead to biased predictions based on so-called sensitive attributes, such as the gender or age of process participants. Previous work has identified this problem and offered solutions that mitigate biases by removing sensitive attributes entirely from the process instance. However, sensitive attributes can be used both fairly and unfairly in the same process instance. For example, during a medical process, treatment decisions could be based on gender, while the decision to accept a patient should not be based on gender. This paper proposes a novel, model-agnostic approach for identifying and rectifying biased decisions in predictive business process monitoring models, even when the same sensitive attribute is used both fairly and unfairly. The proposed approach uses a human-in-the-loop approach to differentiate between fair and unfair decisions through simple alterations on a decision tree model distilled from the original prediction model. Our results show that the proposed approach achieves a promising tradeoff between fairness and accuracy in the presence of biased data. All source code and data are publicly available at this https URL.
Researchers developed a method to enhance Cone-Beam Computed Tomography (CBCT) reconstruction by integrating learnable, wavelet-sparse components into the traditional Feldkamp-Davis-Kress (FDK) algorithm. This approach quantitatively improved image quality metrics like PSNR and SSIM, reduced noise and artifacts, and maintained computational efficiency compared to standard FDK.
28 Aug 2016
Multiexcitonic transitions and emission of several photons per excitation comprise a very attractive feature of semiconductor quantum dots for optoelectronics applications. However, these higher-order radiative processes are usually quenched in colloidal quantum dots by Auger and other non-radiative decay channels. To increase the multiexcitonic quantum efficiency, several groups have explored plasmonic enhancement, so far with moderate results. By controlled positioning of individual quantum dots in the near field of gold nanocone antennas, we enhance the radiative decay rates of monoexcitons and biexcitons by 109 and 100 folds at quantum efficiencies of 60% and 70%, respectively, in very good agreement with the outcome of numerical calculations. We discuss the implications of our work for future fundamental and applied research in nano-optics.
As retailers around the world increase efforts in developing targeted marketing campaigns for different audiences, predicting accurately which customers are most likely to churn ahead of time is crucial for marketing teams in order to increase business profits. This work presents a deep survival framework to predict which customers are at risk of stopping to purchase with retail companies in non-contractual settings. By leveraging the survival model parameters to be learnt by recurrent neural networks, we are able to obtain individual level survival models for purchasing behaviour based only on individual customer behaviour and avoid time-consuming feature engineering processes usually done when training machine learning models.
In this paper, we propose a new six-dimensional (6D) movable antenna (6DMA) system for future wireless networks to improve the communication performance. Unlike the traditional fixed-position antenna (FPA) and existing fluid antenna/two-dimensional (2D) movable antenna (FA/2DMA) systems that adjust the positions of antennas only, the proposed 6DMA system consists of distributed antenna surfaces with independently adjustable three-dimensional (3D) positions as well as 3D rotations within a given space. In particular, this paper applies the 6DMA to the base station (BS) in wireless networks to provide full degrees of freedom (DoFs) for the BS to adapt to the dynamic user spatial distribution in the network. However, a challenging new problem arises on how to optimally control the 6D positions and rotations of all 6DMA surfaces at the BS to maximize the network capacity based on the user spatial distribution, subject to the practical constraints on 6D antennas' movement. To tackle this problem, we first model the 6DMA-enabled BS and the user channels with the BS in terms of 6D positions and rotations of all 6DMA surfaces. Next, we propose an efficient alternating optimization algorithm to search for the best 6D positions and rotations of all 6DMA surfaces by leveraging the Monte Carlo simulation technique. Specifically, we sequentially optimize the 3D position/3D rotation of each 6DMA surface with those of the other surfaces fixed in an iterative manner. Numerical results show that our proposed 6DMA-BS can significantly improve the network capacity as compared to the benchmark BS architectures with FPAs or 6DMAs with limited/partial movability, especially when the user distribution is more spatially non-uniform.
Pre-training on large-scale databases consisting of natural images and then fine-tuning them to fit the application at hand, or transfer-learning, is a popular strategy in computer vision. However, Kataoka et al., 2020 introduced a technique to eliminate the need for natural images in supervised deep learning by proposing a novel synthetic, formula-based method to generate 2D fractals as training corpus. Using one synthetically generated fractal for each class, they achieved transfer learning results comparable to models pre-trained on natural images. In this project, we take inspiration from their work and build on this idea -- using 3D procedural object renders. Since the image formation process in the natural world is based on its 3D structure, we expect pre-training with 3D mesh renders to provide an implicit bias leading to better generalization capabilities in a transfer learning setting and that invariances to 3D rotation and illumination are easier to be learned based on 3D data. Similar to the previous work, our training corpus will be fully synthetic and derived from simple procedural strategies; we will go beyond classic data augmentation and also vary illumination and pose which are controllable in our setting and study their effect on transfer learning capabilities in context to prior work. In addition, we will compare the 2D fractal and 3D procedural object networks to human and non-human primate brain data to learn more about the 2D vs. 3D nature of biological vision.
The manufacturing of light-emitting diodes is a complex semiconductor-manufacturing process, interspersed with different measurements. Among the employed measurements, photoluminescence imaging has several advantages, namely being a non-destructive, fast and thus cost-effective measurement. On a photoluminescence measurement image of an LED wafer, every pixel corresponds to an LED chip's brightness after photo-excitation, revealing chip performance information. However, generating a chip-fine defect map of the LED wafer, based on photoluminescence images, proves challenging for multiple reasons: on the one hand, the measured brightness values vary from image to image, in addition to local spots of differing brightness. On the other hand, certain defect structures may assume multiple shapes, sizes and brightness gradients, where salient brightness values may correspond to defective LED chips, measurement artefacts or non-defective structures. In this work, we revisit the creation of chip-fine defect maps using fully convolutional networks and show that the problem of segmenting objects at multiple scales can be improved by the incorporation of densely connected convolutional blocks and atrous spatial pyramid pooling modules. We also share implementation details and our experiences with training networks with small datasets of measurement images. The proposed architecture significantly improves the segmentation accuracy of highly variable defect structures over our previous version.
In this paper, we consider the use of prior knowledge within neural networks. In particular, we investigate the effect of a known transform within the mapping from input data space to the output domain. We demonstrate that use of known transforms is able to change maximal error bounds. In order to explore the effect further, we consider the problem of X-ray material decomposition as an example to incorporate additional prior knowledge. We demonstrate that inclusion of a non-linear function known from the physical properties of the system is able to reduce prediction errors therewith improving prediction quality from SSIM values of 0.54 to 0.88. This approach is applicable to a wide set of applications in physics and signal processing that provide prior knowledge on such transforms. Also maximal error estimation and network understanding could be facilitated within the context of precision learning.
The rapid evolution of forthcoming sixth-generation (6G) wireless networks necessitates the seamless integration of artificial intelligence (AI) with wireless communications to support emerging intelligent applications that demand both efficient communication and robust learning performance. This dual requirement calls for a unified framework of integrated learning and communication (ILAC), where AI enhances communication through intelligent signal processing and adaptive resource management, while wireless networks support AI model deployment by enabling efficient and reliable data exchanges. However, achieving this integration presents significant challenges in practice. Communication constraints, such as limited bandwidth and fluctuating channels, hinder learning accuracy and convergence. Simultaneously, AI-driven learning dynamics, including model updates and task-driven inference, introduce excessive burdens on communication systems, necessitating flexible, context-aware transmission strategies. Finally, we present a case study on a cost-to-performance optimization problem, where task assignments, model size selection, bandwidth allocation, and transmission power control are jointly optimized, considering computational cost, communication efficiency, and inference accuracy. Leveraging the Dinkelbach and alternating optimization algorithms, we offer a practical and effective solution to achieve an optimal balance between learning performance and communication constraints.
Detecting and analyzing various defect types in semiconductor materials is an important prerequisite for understanding the underlying mechanisms as well as tailoring the production processes. Analysis of microscopy images that reveal defects typically requires image analysis tasks such as segmentation and object detection. With the permanently increasing amount of data that is produced by experiments, handling these tasks manually becomes more and more impossible. In this work, we combine various image analysis and data mining techniques for creating a robust and accurate, automated image analysis pipeline. This allows for extracting the type and position of all defects in a microscopy image of a KOH-etched 4H-SiC wafer that was stitched together from approximately 40,000 individual images.
The correct interpretation of convolutional models is a hard problem for time series data. While saliency methods promise visual validation of predictions for image and language processing, they fall short when applied to time series. These tend to be less intuitive and represent highly diverse data, such as the tool-use time series dataset. Furthermore, saliency methods often generate varied, conflicting explanations, complicating the reliability of these methods. Consequently, a rigorous objective assessment is necessary to establish trust in them. This paper investigates saliency methods on time series data to formulate recommendations for interpreting convolutional models and implements them on the tool-use time series problem. To achieve this, we first employ nine gradient-, propagation-, or perturbation-based post-hoc saliency methods across six varied and complex real-world datasets. Next, we evaluate these methods using five independent metrics to generate recommendations. Subsequently, we implement a case study focusing on tool-use time series using convolutional classification models. Our results validate our recommendations that indicate that none of the saliency methods consistently outperforms others on all metrics, while some are sometimes ahead. Our insights and step-by-step guidelines allow experts to choose suitable saliency methods for a given model and dataset.
Recent advances in generative adversarial networks (GANs) have led to remarkable achievements in face image synthesis. While methods that use style-based GANs can generate strikingly photorealistic face images, it is often difficult to control the characteristics of the generated faces in a meaningful and disentangled way. Prior approaches aim to achieve such semantic control and disentanglement within the latent space of a previously trained GAN. In contrast, we propose a framework that a priori models physical attributes of the face such as 3D shape, albedo, pose, and lighting explicitly, thus providing disentanglement by design. Our method, MOST-GAN, integrates the expressive power and photorealism of style-based GANs with the physical disentanglement and flexibility of nonlinear 3D morphable models, which we couple with a state-of-the-art 2D hair manipulation network. MOST-GAN achieves photorealistic manipulation of portrait images with fully disentangled 3D control over their physical attributes, enabling extreme manipulation of lighting, facial expression, and pose variations up to full profile view.
There are no more papers matching your filters at the moment.