Jheronimus Academy of Data Science
The generative dynamics of diffusion models are governed by spontaneous symmetry breaking, dividing the process into early linear and later attractor-driven phases. This theoretical insight allowed for a Gaussian late initialization scheme, which improved fast sampler performance and sample diversity in image generation.
42
Accurate power load forecasting is essential for the efficient operation and planning of electrical grids, particularly given the increased variability and complexity introduced by renewable energy sources. This paper introduces GAT-LSTM, a hybrid model that combines Graph Attention Networks (GAT) and Long Short-Term Memory (LSTM) networks. A key innovation of the model is the incorporation of edge attributes, such as line capacities and efficiencies, into the attention mechanism, enabling it to dynamically capture spatial relationships grounded in grid-specific physical and operational constraints. Additionally, by employing an early fusion of spatial graph embeddings and temporal sequence features, the model effectively learns and predicts complex interactions between spatial dependencies and temporal patterns, providing a realistic representation of the dynamics of power grids. Experimental evaluations on the Brazilian Electricity System dataset demonstrate that the GAT-LSTM model significantly outperforms state-of-the-art models, achieving reductions of 21. 8% in MAE, 15. 9% in RMSE and 20. 2% in MAPE. These results underscore the robustness and adaptability of the GAT-LSTM model, establishing it as a powerful tool for applications in grid management and energy planning.
Negative Prompting (NP) is widely utilized in diffusion models, particularly in text-to-image applications, to prevent the generation of undesired features. In this paper, we show that conventional NP is limited by the assumption of a constant guidance scale, which may lead to highly suboptimal results, or even complete failure, due to the non-stationarity and state-dependence of the reverse process. Based on this analysis, we derive a principled technique called Dynamic Negative Guidance, which relies on a near-optimal time and state dependent modulation of the guidance without requiring additional training. Unlike NP, negative guidance requires estimating the posterior class probability during the denoising process, which is achieved with limited additional computational overhead by tracking the discrete Markov Chain during the generative process. We evaluate the performance of DNG class-removal on MNIST and CIFAR10, where we show that DNG leads to higher safety, preservation of class balance and image quality when compared with baseline methods. Furthermore, we show that it is possible to use DNG with Stable Diffusion to obtain more accurate and less invasive guidance than NP.
Darknet markets (DNMs) facilitate the trade of illegal goods on a global scale. Gathering data on DNMs is critical to ensuring law enforcement agencies can effectively combat crime. Manually extracting data from DNMs is an error-prone and time-consuming task. Aiming to automate this process we develop a framework for extracting data from DNMs and evaluate the application of three state-of-the-art Named Entity Recognition (NER) models, ELMo-BiLSTM \citep{ShahEtAl2022}, UniversalNER \citep{ZhouEtAl2024}, and GLiNER \citep{ZaratianaEtAl2023}, at the task of extracting complex entities from DNM product listing pages. We propose a new annotated dataset, which we use to train, fine-tune, and evaluate the models. Our findings show that state-of-the-art NER models perform well in information extraction from DNMs, achieving 91% Precision, 96% Recall, and an F1 score of 94%. In addition, fine-tuning enhances model performance, with UniversalNER achieving the best performance.
As Artificial Intelligence (AI) systems, particularly those based on machine learning (ML), become integral to high-stakes applications, their probabilistic and opaque nature poses significant challenges to traditional verification and validation methods. These challenges are exacerbated in regulated sectors requiring tamper-proof, auditable evidence, as highlighted by apposite legal frameworks, e.g., the EU AI Act. Conversely, Zero-Knowledge Proofs (ZKPs) offer a cryptographic solution that enables provers to demonstrate, through verified computations, adherence to set requirements without revealing sensitive model details or data. Through a systematic survey of ZKP protocols, we identify five key properties (non-interactivity, transparent setup, standard representations, succinctness, and post-quantum security) critical for their application in AI validation and verification pipelines. Subsequently, we perform a follow-up systematic survey analyzing ZKP-enhanced ML applications across an adaptation of the Team Data Science Process (TDSP) model (Data & Preprocessing, Training & Offline Metrics, Inference, and Online Metrics), detailing verification objectives, ML models, and adopted protocols. Our findings indicate that current research on ZKP-Enhanced ML primarily focuses on inference verification, while the data preprocessing and training stages remain underexplored. Most notably, our analysis identifies a significant convergence within the research domain toward the development of a unified Zero-Knowledge Machine Learning Operations (ZKMLOps) framework. This emerging framework leverages ZKPs to provide robust cryptographic guarantees of correctness, integrity, and privacy, thereby promoting enhanced accountability, transparency, and compliance with Trustworthy AI principles.
We describe our third-place solution to the UKARA 1.0 challenge on automated essay scoring. The task consists of a binary classification problem on two datasets | answers from two different questions. We ended up using two different models for the two datasets. For task A, we applied a random forest algorithm on features extracted using unigram with latent semantic analysis (LSA). On the other hand, for task B, we only used logistic regression on TF-IDF features. Our model results in F1 score of 0.812.
Many important social phenomena are characterized by repeated interactions among individuals over time such as email exchanges in an organization or face-to-face interactions in a classroom. To understand the underlying mechanisms of social interaction dynamics, statistical simulation techniques of longitudinal network data on a fine temporal granularity are crucially important. This paper makes two contributions to the field. First, we present statistical frameworks to simulate relational event networks under dyadic and actor-oriented relational event models which are implemented in a new R package 'remulate'. Second, we explain how the simulation framework can be used to address challenging problems in temporal social network analysis, such as model fit assessment, theory building, network intervention planning, making predictions, understanding the impact of network structures, to name a few. This is shown in three extensive case studies. In the first study, it is elaborated why simulation-based techniques are crucial for relational event model assessment which is illustrated for a network of criminal gangs. In the second study, it is shown how simulation techniques are important when building and extending theories about social phenomena which is illustrated via optimal distinctiveness theory. In the third study, we demonstrate how simulation techniques contribute to a better understanding of the longevity and the potential effect sizes of network interventions. Through these case studies and software, researchers will be able to better understand social interaction dynamics using relational event data from real-life networks.
We present a new algorithm based on posterior sampling for learning in Constrained Markov Decision Processes (CMDP) in the infinite-horizon undiscounted setting. The algorithm achieves near-optimal regret bounds while being advantageous empirically compared to the existing algorithms. Our main theoretical result is a Bayesian regret bound for each cost component of O~(DSAT)\tilde{O} (DS\sqrt{AT}) for any communicating CMDP with SS states, AA actions, and diameter DD. This regret bound matches the lower bound in order of time horizon TT and is the best-known regret bound for communicating CMDPs achieved by a computationally tractable algorithm. Empirical results show that our posterior sampling algorithm outperforms the existing algorithms for constrained reinforcement learning.
This vision paper presents initial research on assessing the robustness and reliability of AI-enabled systems, and key factors in ensuring their safety and effectiveness in practical applications, including a focus on accountability. By exploring evolving definitions of these concepts and reviewing current literature, the study highlights major challenges and approaches in the field. A case study is used to illustrate real-world applications, emphasizing the need for innovative testing solutions. The incorporation of accountability is crucial for building trust and ensuring responsible AI development. The paper outlines potential future research directions and identifies existing gaps, positioning robustness, reliability, and accountability as vital areas for the development of trustworthy AI systems of the future.
This case study describes challenges and lessons learned on building Ocean Guard: a Machine Learning-Enabled System (MLES) for anomaly detection in the maritime domain. First, the paper presents the system's specification, and architecture. Ocean Guard was designed with a microservices' architecture to enable multiple teams to work on the project in parallel. Then, the paper discusses how the developers adapted contract-based design to MLOps for achieving that goal. As a MLES, Ocean Guard employs code, model, and data contracts to establish guidelines between its services. This case study hopes to inspire software engineers, machine learning engineers, and data scientists to leverage similar approaches for their systems.
Deploying a Machine Learning (ML) training pipeline into production requires good software engineering practices. Unfortunately, the typical data science workflow often leads to code that lacks critical software quality attributes. This experience report investigates this problem in SPIRA, a project whose goal is to create an ML-Enabled System (MLES) to pre-diagnose insufficiency respiratory via speech analysis. This paper presents an overview of the architecture of the MLES, then compares three versions of its Continuous Training subsystem: from a proof of concept Big Ball of Mud (v1), to a design pattern-based Modular Monolith (v2), to a test-driven set of Microservices (v3) Each version improved its overall extensibility, maintainability, robustness, and resiliency. The paper shares challenges and lessons learned in this process, offering insights for researchers and practitioners seeking to productionize their pipelines.
Dynamic social networks can be conceptualized as sequences of dyadic interactions between individuals over time. The relational event model has been the workhorse to analyze such interaction sequences in empirical social network research. When addressing possible unobserved heterogeneity in the interaction mechanisms, standard approaches, such as the stochastic block model, aim to cluster the variation at the actor level. Though useful, the implied latent structure of the adjacency matrix is restrictive which may lead to biased interpretations and insights. To address this shortcoming, we introduce a more flexible dyadic latent class relational event model (DLC-REM) that captures the unobserved heterogeneity at the dyadic level. Through numerical simulations, we provide a proof of concept demonstrating that this approach is more general than latent actor-level approaches. To illustrate the applicability of the model, we apply it to a dataset of militarized interstate conflicts between countries.
In medical, social, and behavioral research we often encounter datasets with a multilevel structure and multiple correlated dependent variables. These data are frequently collected from a study population that distinguishes several subpopulations with different (i.e., heterogeneous) effects of an intervention. Despite the frequent occurrence of such data, methods to analyze them are less common and researchers often resort to either ignoring the multilevel and/or heterogeneous structure, analyzing only a single dependent variable, or a combination of these. These analysis strategies are suboptimal: Ignoring multilevel structures inflates Type I error rates, while neglecting the multivariate or heterogeneous structure masks detailed insights. To analyze such data comprehensively, the current paper presents a novel Bayesian multilevel multivariate logistic regression model. The clustered structure of multilevel data is taken into account, such that posterior inferences can be made with accurate error rates. Further, the model shares information between different subpopulations in the estimation of average and conditional average multivariate treatment effects. To facilitate interpretation, multivariate logistic regression parameters are transformed to posterior success probabilities and differences between them. A numerical evaluation compared our framework to less comprehensive alternatives and highlighted the need to model the multilevel structure: Treatment comparisons based on the multilevel model had targeted Type I error rates, while single-level alternatives resulted in inflated Type I errors. Further, the multilevel model was more powerful than a single-level model when the number of clusters was higher. ...
In relational event networks, the tendency for actors to interact with each other depends greatly on the past interactions between the actors in a social network. Both the quantity of past interactions and the time that elapsed since the past interactions occurred affect the actors' decision-making to interact with other actors in the network. Recently occurred events generally have a stronger influence on current interaction behavior than past events that occurred a long time ago--a phenomenon known as "memory decay". Previous studies either predefined a short-run and long-run memory or fixed a parametric exponential memory using a predefined half-life period. In real-life relational event networks however it is generally unknown how the memory of actors about the past events fades as time goes by. For this reason it is not recommendable to fix this in an ad hoc manner, but instead we should learn the shape of memory decay from the observed data. In this paper, a novel semi-parametric approach based on Bayesian Model Averaging is proposed for learning the shape of the memory decay without requiring any parametric assumptions. The method is applied to relational event history data among socio-political actors in India.
Organizations, particularly medium and large enterprises, typically rely heavily on complex, distributed systems to deliver critical services and products. However, the growing complexity of these systems poses challenges in ensuring service availability, performance, and reliability. Traditional resilience testing methods often fail to capture the intricate interactions and failure modes of modern systems. Chaos Engineering addresses these challenges by proactively testing how systems in production behave under turbulent conditions, allowing developers to uncover and resolve potential issues before they escalate into outages. Though chaos engineering has received growing attention from researchers and practitioners alike, we observed a lack of reviews that synthesize insights from both academic and grey literature. Hence, we conducted a Multivocal Literature Review (MLR) on chaos engineering to address this research gap by systematically analyzing 96 academic and grey literature sources published between January 2016 and April 2024. We first used the chosen sources to derive a unified definition of chaos engineering and to identify key functionalities, components, and adoption drivers. We also developed a taxonomy for chaos engineering platforms and compared the relevant tools using it. Finally, we analyzed the current state of chaos engineering research and identified several open research issues.
Data originating from open-source software projects provide valuable information to enhance software quality. In the scope of Software Defect Prediction, one of the most challenging parts is extracting valid data about failure-prone software components from these repositories, which can help develop more robust software. In particular, collecting data, calculating metrics, and synthesizing results from these repositories is a tedious and error-prone task, which often requires understanding the programming languages involved in the mined repositories, eventually leading to a proliferation of language-specific data-mining software. This paper presents RepoMiner, a language-agnostic tool developed to support software engineering researchers in creating datasets to support any study on defect prediction. RepoMiner automatically collects failure data from software components, labels them as failure-prone or neutral, and calculates metrics to be used as ground truth for defect prediction models. We present its implementation and provide examples of its application.
The effects of treatments may differ between persons with different characteristics. Addressing such treatment heterogeneity is crucial to investigate whether patients with specific characteristics are likely to benefit from a new treatment. The current paper presents a novel Bayesian method for superiority decision-making in the context of randomized controlled trials with multivariate binary responses and heterogeneous treatment effects. The framework is based on three elements: a) Bayesian multivariate logistic regression analysis with a Pólya-Gamma expansion; b) a transformation procedure to transfer obtained regression coefficients to a more intuitive multivariate probability scale (i.e., success probabilities and the differences between them); and c) a compatible decision procedure for treatment comparison with prespecified decision error rates. Procedures for a priori sample size estimation under a non-informative prior distribution are included. A numerical evaluation demonstrated that decisions based on a priori sample size estimation resulted in anticipated error rates among the trial population as well as subpopulations. Further, average and conditional treatment effect parameters could be estimated unbiasedly when the sample was large enough. Illustration with the International Stroke Trial dataset revealed a trend towards heterogeneous effects among stroke patients: Something that would have remained undetected when analyses were limited to average treatment effects.
We study a posterior sampling approach to efficient exploration in constrained reinforcement learning. Alternatively to existing algorithms, we propose two simple algorithms that are more efficient statistically, simpler to implement and computationally cheaper. The first algorithm is based on a linear formulation of CMDP, and the second algorithm leverages the saddle-point formulation of CMDP. Our empirical results demonstrate that, despite its simplicity, posterior sampling achieves state-of-the-art performance and, in some cases, significantly outperforms optimistic algorithms.
This paper presents a bidding system for sponsored search auctions under an unknown valuation model. This formulation assumes that the bidder's value is unknown, evolving arbitrarily, and observed only upon winning an auction. Unlike previous studies, we do not impose any assumptions on the nature of feedback and consider the problem of bidding in sponsored search auctions in its full generality. Our system is based on a bandit framework that is resilient to the black-box auction structure and delayed and batched feedback. To validate our proposed solution, we conducted a case study at Zalando, a leading fashion e-commerce company. We outline the development process and describe the promising outcomes of our bandits-based approach to increase profitability in sponsored search auctions. We discuss in detail the technical challenges that were overcome during the implementation, shedding light on the mechanisms that led to increased profitability.
The use of AI in microservices (MSs) is an emerging field as indicated by a substantial number of surveys. However these surveys focus on a specific problem using specific AI techniques, therefore not fully capturing the growth of research and the rise and disappearance of trends. In our systematic mapping study, we take an exhaustive approach to reveal all possible connections between the use of AI techniques for improving any quality attribute (QA) of MSs during the DevOps phases. Our results include 16 research themes that connect to the intersection of particular QAs, AI domains and DevOps phases. Moreover by mapping identified future research challenges and relevant industry domains, we can show that many studies aim to deliver prototypes to be automated at a later stage, aiming at providing exploitable products in a number of key industry domains.
There are no more papers matching your filters at the moment.