University of West Attica
Handwritten Text Generation (HTG) conditioned on text and style is a challenging task due to the variability of inter-user characteristics and the unlimited combinations of characters that form new words unseen during training. Diffusion Models have recently shown promising results in HTG but still remain under-explored. We present DiffusionPen (DiffPen), a 5-shot style handwritten text generation approach based on Latent Diffusion Models. By utilizing a hybrid style extractor that combines metric learning and classification, our approach manages to capture both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples. Moreover, we explore several variation strategies of the data with multi-style mixtures and noisy embeddings, enhancing the robustness and diversity of the generated data. Extensive experiments using IAM offline handwriting database show that our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems. The code is available at: this https URL.
36
A comprehensive survey by the University of West Attica systematically compares hardware accelerators for Large Language Models by normalizing their performance and energy efficiency to a common 16nm technology node. The analysis reveals that while ASICs lead in raw performance, in-memory computing solutions achieve orders of magnitude higher energy efficiency, reaching up to 20,500 GOPS/W.
Diffusion-based Handwritten Text Generation (HTG) approaches achieve impressive results on frequent, in-vocabulary words observed at training time and on regular styles. However, they are prone to memorizing training samples and often struggle with style variability and generation clarity. In particular, standard diffusion models tend to produce artifacts or distortions that negatively affect the readability of the generated text, especially when the style is hard to produce. To tackle these issues, we propose a novel sampling guidance strategy, Dual Orthogonal Guidance (DOG), that leverages an orthogonal projection of a negatively perturbed prompt onto the original positive prompt. This approach helps steer the generation away from artifacts while maintaining the intended content, and encourages more diverse, yet plausible, outputs. Unlike standard Classifier-Free Guidance (CFG), which relies on unconditional predictions and produces noise at high guidance scales, DOG introduces a more stable, disentangled direction in the latent space. To control the strength of the guidance across the denoising process, we apply a triangular schedule: weak at the start and end of denoising, when the process is most sensitive, and strongest in the middle steps. Experimental results on the state-of-the-art DiffusionPen and One-DM demonstrate that DOG improves both content clarity and style variability, even for out-of-vocabulary words and challenging writing styles.
Handwritten text recognition has been developed rapidly in the recent years, following the rise of deep learning and its applications. Though deep learning methods provide notable boost in performance concerning text recognition, non-trivial deviation in performance can be detected even when small pre-processing or architectural/optimization elements are changed. This work follows a ``best practice'' rationale; highlight simple yet effective empirical practices that can further help training and provide well-performing handwritten text recognition systems. Specifically, we considered three basic aspects of a deep HTR system and we proposed simple yet effective solutions: 1) retain the aspect ratio of the images in the preprocessing step, 2) use max-pooling for converting the 3D feature map of CNN output into a sequence of features and 3) assist the training procedure via an additional CTC loss which acts as a shortcut on the max-pooled sequential features. Using these proposed simple modifications, one can attain close to state-of-the-art results, while considering a basic convolutional-recurrent (CNN+LSTM) architecture, for both IAM and RIMES datasets. Code is available at this https URL
In this work we numerically analyze a photonic unconventional accelerator based on the four-wave mixing effect in highly nonlinear waveguides. The proposed scheme can act as a fully analogue system for nonlinear signal processing directly in the optical domain. By exploiting the rich Kerr-induced nonlinearities, multiple nonlinear transformations of an input signal can be generated and used for solving complex nonlinear tasks. We first evaluate the performance of our scheme in the Santa-Fe chaotic time-series prediction. The true power of this processor is revealed in the all-optical nonlinearity compensation in an optical communication scenario where we provide results superior to those offered by strong machine learning algorithms with reduced power consumption and computational complexity. Finally, we showcase how the FWM module can be used as a reconfigurable nonlinear activation module being capable of reproducing characteristic functions such as sigmoid or rectified linear unit.
Text-to-Image synthesis is the task of generating an image according to a specific text description. Generative Adversarial Networks have been considered the standard method for image synthesis virtually since their introduction. Denoising Diffusion Probabilistic Models are recently setting a new baseline, with remarkable results in Text-to-Image synthesis, among other fields. Aside its usefulness per se, it can also be particularly relevant as a tool for data augmentation to aid training models for other document image processing tasks. In this work, we present a latent diffusion-based method for styled text-to-text-content-image generation on word-level. Our proposed method is able to generate realistic word image samples from different writer styles, by using class index styles and text content prompts without the need of adversarial training, writer recognition, or text recognition. We gauge system performance with the Fr\'echet Inception Distance, writer recognition accuracy, and writer retrieval. We show that the proposed model produces samples that are aesthetically pleasing, help boosting text recognition performance, and get similar writer retrieval score as real data. Code is available at: https://github.com/koninik/WordStylist.
73
We investigate the impact of several critical events associated with the Russo Ukrainian war, started officially on 24 February 2022 with the Russian invasion of Ukraine, on ten European electricity markets, two natural gas markets (the European reference trading hub TTF and N.Y. NGNMX market) and how these markets interact to each other and with USDRUB exchange rate, a financial market. We analyze the reactions of these markets, manifested as breakpoints attributed to these critical events, and their interaction, by using a set of three tools that can shed light on different aspects of this complex situation. We combine the concepts of market efficiency, measured by quantifying the Efficient market hypothesis (EMH) via rolling Hurst exponent, with structural breakpoints occurred in the time series of gas, electricity and financial markets, the detection of which is possible by using a Bayesian ensemble approach, the Bayesian Estimator of Abrupt change, Seasonal change and Trend (BEAST), a powerful tool that can effectively detect structural breakpoints, trends, seasonalities and sudden abrupt changes in time series. The results show that the analyzed markets have exhibited different modes of reactions to the critical events, both in respect of number, nature, and time of occurrence (leading, lagging, concurrent with dates of critical events) of breakpoints as well as of the dynamic behavior of their trend components.
This work introduces EnhancePPG, a method that improves heart rate estimation from photoplethysmography (PPG) signals by combining self-supervised learning with a specialized data augmentation strategy. The approach achieves a Mean Absolute Error of 3.54 BPM, a 12.2% improvement over the previous state-of-the-art, while maintaining low computational latency for wearable device deployment.
Human-centricity is the core value behind the evolution of manufacturing towards Industry 5.0. Nevertheless, there is a lack of architecture that considers safety, trustworthiness, and human-centricity at its core. Therefore, we propose an architecture that integrates Artificial Intelligence (Active Learning, Forecasting, Explainable Artificial Intelligence), simulated reality, decision-making, and users' feedback, focusing on synergies between humans and machines. Furthermore, we align the proposed architecture with the Big Data Value Association Reference Architecture Model. Finally, we validate it on three use cases from real-world case studies.
Global emissions from fossil fuel combustion and cement production were recorded in 2022, signaling a resurgence to pre-pandemic levels and providing an apodictic indication that emission peaks have not yet been achieved. Significant contributions to this upward trend are made by the Information and Communication Technology (ICT) industry due to its substantial energy consumption. This shows the need for further exploration of swarm intelligence applications to measure and optimize the carbon footprint within ICT. All causative factors are evaluated based on the quality of data collection; variations from each source are quantified; and an objective function related to carbon footprint in ICT energy management is optimized. Emphasis is placed on the asyndetic integration of data sources to construct a convex optimization problem. An apodictic necessity to prevent the erosion of accuracy in carbon footprint assessments is addressed. Complexity percentages ranged from 5.25% for the Bat Algorithm to 7.87% for Fast Bacterial Swarming, indicating significant fluctuations in resource intensity among algorithms. These findings suggest that we were able to quantify the environmental impact of various swarm algorithms.
Object pose estimation is a task that is of central importance in 3D Computer Vision. Given a target image and a canonical pose, a single point estimate may very often be sufficient; however, a probabilistic pose output is related to a number of benefits when pose is not unambiguous due to sensor and projection constraints or inherent object symmetries. With this paper, we explore the usefulness of using the well-known Euler angles parameterisation as a basis for a Normalizing Flows model for pose estimation. Isomorphic to spatial rotation, 3D pose has been parameterized in a number of ways, either in or out of the context of parameter estimation. We explore the idea that Euler angles, despite their shortcomings, may lead to useful models in a number of aspects, compared to a model built on a more complex parameterisation.
In this work, we present and experimentally validate a passive photonic-integrated neuromorphic accelerator that uses a hardware-friendly optical spectrum slicing technique through a reconfigurable silicon photonic mesh. The proposed scheme acts as an analogue convolutional engine, enabling information preprocessing in the optical domain, dimensionality reduction and extraction of spatio-temporal features. Numerical results demonstrate that utilizing only 7 passive photonic nodes, critical modules of a digital convolutional neural network can be replaced. As a result, a 98.6% accuracy on the MNIST dataset was achieved, with a power consumption reduction of at least 26% compared to digital CNNs. Experimental results confirm these findings, achieving 97.7% accuracy with only 3 passive nodes.
Recent advances in sensing technologies require the design and development of pattern recognition models capable of processing spatiotemporal data efficiently. In this study, we propose a spatially and temporally aware tensor-based neural network for human pose classification using three-dimensional skeleton data. Our model employs three novel components. First, an input layer capable of constructing highly discriminative spatiotemporal features. Second, a tensor fusion operation that produces compact yet rich representations of the data, and third, a tensor-based neural network that processes data representations in their original tensor form. Our model is end-to-end trainable and characterized by a small number of trainable parameters making it suitable for problems where the annotated data is limited. Experimental evaluation of the proposed model indicates that it can achieve state-of-the-art performance.
Handwritten Text Recognition (HTR) is a task of central importance in the field of document image understanding. State-of-the-art methods for HTR require the use of extensive annotated sets for training, making them impractical for low-resource domains like historical archives or limited-size modern collections. This paper introduces a novel framework that, unlike the standard HTR model paradigm, can leverage mild prior knowledge of lexical characteristics; this is ideal for scenarios where labeled data are scarce. We propose an iterative bootstrapping approach that aligns visual features extracted from unlabeled images with semantic word representations using Optimal Transport (OT). Starting with a minimal set of labeled examples, the framework iteratively matches word images to text labels, generates pseudo-labels for high-confidence alignments, and retrains the recognizer on the growing dataset. Numerical experiments demonstrate that our iterative visual-semantic alignment scheme significantly improves recognition accuracy on low-resource HTR benchmarks.
An increasing number of emerging applications in data science and engineering are based on multidimensional and structurally rich data. The irregularities, however, of high-dimensional data often compromise the effectiveness of standard machine learning algorithms. We hereby propose the Rank-R Feedforward Neural Network (FNN), a tensor-based nonlinear learning model that imposes Canonical/Polyadic decomposition on its parameters, thereby offering two core advantages compared to typical machine learning methods. First, it handles inputs as multilinear arrays, bypassing the need for vectorization, and can thus fully exploit the structural information along every data dimension. Moreover, the number of the model's trainable parameters is substantially reduced, making it very efficient for small sample setting problems. We establish the universal approximation and learnability properties of Rank-R FNN, and we validate its performance on real-world hyperspectral datasets. Experimental evaluations show that Rank-R FNN is a computationally inexpensive alternative of ordinary FNN that achieves state-of-the-art performance on higher-order tensor data.
In this work, very deep super-resolution (VDSR) method is presented for improving the spatial resolution of remotely sensed (RS) images for scale factor 4. The VDSR net is re-trained with Sentinel-2 images and with drone aero orthophoto images, thus becomes RS-VDSR and Aero-VDSR, respectively. A novel loss function, the Var-norm estimator, is proposed in the regression layer of the convolutional neural network during re-training and prediction. According to numerical and optical comparisons, the proposed nets RS-VDSR and Aero-VDSR can outperform VDSR during prediction with RS images. RS-VDSR outperforms VDSR up to 3.16 dB in terms of PSNR in Sentinel-2 images.
One requirement of maintaining digital information is storage. With the latest advances in the digital world, new emerging media types have required even more storage space to be kept than before. In fact, in many cases it is required to have larger amounts of storage to keep up with protocols that support more types of information at the same time. In contrast, compression algorithms have been integrated to facilitate the transfer of larger data. Numerical representations are construed as embodiments of information. However, this correct association of a sequence could feasibly be inverted to signify an elongated series of numerals. In this work, a novel mathematical paradigm was introduced to engineer a methodology reliant on iterative logarithmic transformations, finely tuned to numeric sequences. Through this fledgling approach, an intricate interplay of polymorphic numeric manipulations was conducted. By applying repeated logarithmic operations, the data were condensed into a minuscule representation. Approximately thirteen times surpassed the compression method, ZIP. Such extreme compaction, achieved through iterative reduction of expansive integers until they manifested as single-digit entities, conferred a novel sense of informational embodiment. Instead of relegating data to classical discrete encodings, this method transformed them into a quasi-continuous, logarithmically. By contrast, this introduced approach revealed that morphing data into deeply compressed numerical substrata beyond conventional boundaries was feasible. A holistic perspective emerges, validating that numeric data can be recalibrated into ephemeral sequences of logarithmic impressions. It was not merely a matter of reducing digits, but of reinterpreting data through a resolute numeric vantage.
Document clustering is a traditional, efficient and yet quite effective, text mining technique when we need to get a better insight of the documents of a collection that could be grouped together. The K-Means algorithm and the Hierarchical Agglomerative Clustering (HAC) algorithm are two of the most known and commonly used clustering algorithms; the former due to its low time cost and the latter due to its accuracy. However, even the use of K-Means in text clustering over large-scale collections can lead to unacceptable time costs. In this paper we first address some of the most valuable approaches for document clustering over such 'big data' (large-scale) collections. We then present two very promising alternatives: (a) a variation of an existing K-Means-based fast clustering technique (known as BigKClustering - BKC) so that it can be applied in document clustering, and (b) a hybrid clustering approach based on a customized version of the Buckshot algorithm, which first applies a hierarchical clustering procedure on a sample of the input dataset and then it uses the results as the initial centers for a K-Means based assignment of the rest of the documents, with very few iterations. We also give highly efficient adaptations of the proposed techniques in the MapReduce model which are then experimentally tested using Apache Hadoop and Spark over a real cluster environment. As it comes out of the experiments, they both lead to acceptable clustering quality as well as to significant time improvements (compared to K-Means - especially the Buckshot-based algorithm), thus constituting very promising alternatives for big document collections.
We investigate about the existence of static solitons solutions in the inverse G-L free energy in phase transitions of {\phi}4 theory. We calculate all the characteristics of these solitons , like localized structure, finite energy and mass . We show that solitons appears in spontaneous symmetry breaking (SSB) phenomenon only if the critical point is in excited state. When the time is introduced we shown that the static solitons of SSB remain as solitons in Minkowski space-time while the static solitons of inverse G-L free energy converted to instantons in Euclidean space. The violation of causality which appears in the tachyonic field could be faced passing from Minkowski solitons to Euclidean instantons.
Nowadays, Hearth Rate (HR) monitoring is a key feature of almost all wrist-worn devices exploiting photoplethysmography (PPG) sensors. However, arm movements affect the performance of PPG-based HR tracking. This issue is usually addressed by fusing the PPG signal with data produced by inertial measurement units. Thus, deep learning algorithms have been proposed, but they are considered too complex to deploy on wearable devices and lack the explainability of results. In this work, we present a new deep learning model, PULSE, which exploits temporal convolutions and multi-head cross-attention to improve sensor fusion's effectiveness and achieve a step towards explainability. We evaluate the performance of PULSE on three publicly available datasets, reducing the mean absolute error by 7.56% on the most extensive available dataset, PPG-DaLiA. Finally, we demonstrate the explainability of PULSE and the benefits of applying attention modules to PPG and motion data.
There are no more papers matching your filters at the moment.