alphaXiv

Fraunhofer Center Digital EnergyFraunhofer FIT

22 Sep 2025

Towards Systematic Specification and Verification of Fairness Requirements: A Position Paper

University of Southern Denmark Fraunhofer FIT

Decisions suggested by improperly designed software systems might be prone to discriminate against people based on protected characteristics, such as gender and ethnicity. Previous studies attribute such undesired behavior to flaws in algorithmic design or biased data. However, these studies ignore that discrimination is often the result of a lack of well-specified fairness requirements and their verification. The fact that experts' knowledge about fairness is often implicit makes the task of specifying precise and verifiable fairness requirements difficult. In related domains, such as security engineering, knowledge graphs have been proven to be effective in formalizing knowledge to assist requirements specification and verification. To address the lack of formal mechanisms for specifying and verifying fairness requirements, we propose the development of a knowledge graph-based framework for fairness. In this paper, we discuss the challenges, research questions, and a road map towards addressing the research questions.

01 Sep 2020

computer-science cryptography-and-security networking-and-internet-architecture

Graph-based Model of Smart Grid Architectures

Fraunhofer FKIE Fraunhofer FIT

Researchers from Fraunhofer FIT and FKIE developed a graph-based modeling approach that automatically generates integrated smart grid infrastructure models, combining electrical distribution grids with their associated ICT infrastructure. This methodology streamlines the setup of complex co-simulations and supports the development of process-aware security mechanisms.

18 Jul 2024

computer-science computation-and-language databases

PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks

Fraunhofer FIT RWTH Aachen University

RWTH Aachen University's Process and Data Science Chair introduces PM-LLM-Benchmark, a comprehensive framework for evaluating Large Language Models on various process mining tasks. The work evaluates state-of-the-art commercial and open-source LLMs using an LLM-as-a-Judge strategy, identifying current capabilities and limitations across different task categories.

26 Jun 2024

computer-science artificial-intelligence explainable-ai

Implications of the AI Act for Non-Discrimination Law and Algorithmic Fairness

University of Bayreuth Fraunhofer FIT

The topic of fairness in AI, as debated in the FATE (Fairness, Accountability, Transparency, and Ethics in AI) communities, has sparked meaningful discussions in the past years. However, from a legal perspective, particularly from the perspective of European Union law, many open questions remain. Whereas algorithmic fairness aims to mitigate structural inequalities at design-level, European non-discrimination law is tailored to individual cases of discrimination after an AI model has been deployed. The AI Act might present a tremendous step towards bridging these two approaches by shifting non-discrimination responsibilities into the design stage of AI models. Based on an integrative reading of the AI Act, we comment on legal as well as technical enforcement problems and propose practical implications on bias detection and bias correction in order to specify and comply with specific technical requirements.

06 Aug 2021

computer-science computation-and-language machine-learning

DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Noakhali Science and Technology University Fraunhofer FIT National University of Ireland RWTH Aachen University

The exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices, but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize textual data for social and anti-social behaviour analysis, by predicting the contexts mostly for highly-resourced languages like English. However, some languages are under-resourced, e.g., South Asian languages like Bengali, that lack computational resources for accurate natural language processing (NLP). In this paper, we propose an explainable approach for hate speech detection from the under-resourced Bengali language, which we called DeepHateExplainer. Bengali texts are first comprehensively preprocessed, before classifying them into political, personal, geopolitical, and religious hates using a neural ensemble method of transformer-based neural architectures (i.e., monolingual Bangla BERT-base, multilingual BERT-cased/uncased, and XLM-RoBERTa). Important(most and least) terms are then identified using sensitivity analysis and layer-wise relevance propagation(LRP), before providing human-interpretable explanations. Finally, we compute comprehensiveness and sufficiency scores to measure the quality of explanations w.r.t faithfulness. Evaluations against machine learning~(linear and tree-based models) and neural networks (i.e., CNN, Bi-LSTM, and Conv-LSTM with word embeddings) baselines yield F1-scores of 78%, 91%, 89%, and 84%, for political, personal, geopolitical, and religious hates, respectively, outperforming both ML and DNN baselines.

05 Apr 2024

computer-science databases human-ai-interaction

Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, and Evaluation Strategies

Fraunhofer FIT RWTH Aachen University

This paper establishes a comprehensive framework for evaluating Large Language Models (LLMs) in process mining, defining four essential LLM capabilities and proposing a multi-faceted evaluation strategy that combines automatic, human, and self-evaluation methods for PM-specific tasks.

11 Jul 2022

computer-science cryptography-and-security

PowerDuck: A GOOSE Data Set of Cyberattacks in Substations

Fraunhofer FKIE Fraunhofer FIT RWTH Aachen University

Power grids worldwide are increasingly victims of cyberattacks, where attackers can cause immense damage to critical infrastructure. The growing digitalization and networking in power grids combined with insufficient protection against cyberattacks further exacerbate this trend. Hence, security engineers and researchers must counter these new risks by continuously improving security measures. Data sets of real network traffic during cyberattacks play a decisive role in analyzing and understanding such attacks. Therefore, this paper presents PowerDuck, a publicly available security data set containing network traces of GOOSE communication in a physical substation testbed. The data set includes recordings of various scenarios with and without the presence of attacks. Furthermore, all network packets originating from the attacker are clearly labeled to facilitate their identification. We thus envision PowerDuck improving and complementing existing data sets of substations, which are often generated synthetically, thus enhancing the security of power grids.

12 Jul 2024

computer-science databases

Challenges of Anomaly Detection in the Object-Centric Setting: Dimensions and the Role of Domain Knowledge

Eindhoven University of Technology Fraunhofer FIT ECE Group Services RWTH Aachen University

Object-centric event logs, allowing events related to different objects of different object types, represent naturally the execution of business processes, such as ERP (O2C and P2P) and CRM. However, modeling such complex information requires novel process mining techniques and might result in complex sets of constraints. Object-centric anomaly detection exploits both the lifecycle and the interactions between the different objects. Therefore, anomalous patterns are proposed to the user without requiring the definition of object-centric process models. This paper proposes different methodologies for object-centric anomaly detection and discusses the role of domain knowledge for these methodologies. We discuss the advantages and limitations of Large Language Models (LLMs) in the provision of such domain knowledge. Following our experience in a real-life P2P process, we also discuss the role of algorithms (dimensionality reduction+anomaly detection), suggest some pre-processing steps, and discuss the role of feature propagation.

18 Sep 2025

computer-science databases

Revealing Inherent Concurrency in Event Data: A Partial Order Approach to Process Discovery

Fraunhofer FIT RWTH Aachen University

Process discovery algorithms traditionally linearize events, failing to capture the inherent concurrency of real-world processes. While some techniques can handle partially ordered data, they often struggle with scalability on large event logs. We introduce a novel, scalable algorithm that directly leverages partial orders in process discovery. Our approach derives partially ordered traces from event data and aggregates them into a sound-by-construction, perfectly fitting process model. Our hierarchical algorithm preserves inherent concurrency while systematically abstracting exclusive choices and loop patterns, enhancing model compactness and precision. We have implemented our technique and demonstrated its applicability on complex real-life event logs. Our work contributes a scalable solution for a more faithful representation of process behavior, especially when concurrency is prevalent in event data.

27 Oct 2024

computer-science artificial-intelligence computer-vision-and-pattern-recognition

CollaFuse: Collaborative Diffusion Models

German Research Center for Artificial Intelligence (DFKI)Technical University of Darmstadt University of Bayreuth Fraunhofer FIT

In the landscape of generative artificial intelligence, diffusion-based models have emerged as a promising method for generating synthetic images. However, the application of diffusion models poses numerous challenges, particularly concerning data availability, computational requirements, and privacy. Traditional approaches to address these shortcomings, like federated learning, often impose significant computational burdens on individual clients, especially those with constrained resources. In response to these challenges, we introduce a novel approach for distributed collaborative diffusion models inspired by split learning. Our approach facilitates collaborative training of diffusion models while alleviating client computational burdens during image synthesis. This reduced computational burden is achieved by retaining data and computationally inexpensive processes locally at each client while outsourcing the computationally expensive processes to shared, more efficient server resources. Through experiments on the common CelebA dataset, our approach demonstrates enhanced privacy by reducing the necessity for sharing raw data. These capabilities hold significant potential across various application areas, including the design of edge computing solutions. Thus, our work advances distributed machine learning by contributing to the evolution of collaborative diffusion models.

22 Feb 2024

computer-science cryptography-and-security

Enhancing SCADA Security: Developing a Host-Based Intrusion Detection System to Safeguard Against Cyberattacks

Fraunhofer FKIE Fraunhofer FIT RWTH Aachen University

With the increasing reliance of smart grids on correctly functioning SCADA systems and their vulnerability to cyberattacks, there is a pressing need for effective security measures. SCADA systems are prone to cyberattacks, posing risks to critical infrastructure. As there is a lack of host-based intrusion detection systems specifically designed for the stable nature of SCADA systems, the objective of this work is to propose a host-based intrusion detection system tailored for SCADA systems in smart grids. The proposed system utilizes USB device identification, flagging, and process memory scanning to monitor and detect anomalies in SCADA systems, providing enhanced security measures. Evaluation in three different scenarios demonstrates the tool's effectiveness in detecting and disabling malware. The proposed approach effectively identifies potential threats and enhances the security of SCADA systems in smart grids, providing a promising solution to protect against cyberattacks.

24 Jul 2023

computer-science databases human-ai-interaction

Leveraging Large Language Models (LLMs) for Process Mining (Technical Report)

Fraunhofer FIT RWTH Aachen University

This technical report describes the intersection of process mining and large language models (LLMs), specifically focusing on the abstraction of traditional and object-centric process mining artifacts into textual format. We introduce and explore various prompting strategies: direct answering, where the large language model directly addresses user queries; multi-prompt answering, which allows the model to incrementally build on the knowledge obtained through a series of prompts; and the generation of database queries, facilitating the validation of hypotheses against the original event log. Our assessment considers two large language models, GPT-4 and Google's Bard, under various contextual scenarios across all prompting strategies. Results indicate that these models exhibit a robust understanding of key process mining abstractions, with notable proficiency in interpreting both declarative and procedural process models. In addition, we find that both models demonstrate strong performance in the object-centric setting, which could significantly propel the advancement of the object-centric process mining discipline. Additionally, these models display a noteworthy capacity to evaluate various concepts of fairness in process mining. This opens the door to more rapid and efficient assessments of the fairness of process mining event logs, which has significant implications for the field. The integration of these large language models into process mining applications may open new avenues for exploration, innovation, and insight generation in the field.

11 Nov 2021

computer-science cryptography-and-security computers-and-society

Designing a Framework for Digital KYC Processes Built on Blockchain-Based Self-Sovereign Identity

University of Bayreuth Fraunhofer FIT Frankfurt University of Applied Sciences

Know your customer (KYC) processes place a great burden on banks, because they are costly, inefficient, and inconvenient for customers. While blockchain technology is often mentioned as a potential solution, it is not clear how to use the technology's advantages without violating data protection regulations and customer privacy. We demonstrate how blockchain-based self-sovereign identity (SSI) can solve the challenges of KYC. We follow a rigorous design science research approach to create a framework that utilizes SSI in the KYC process, deriving nascent design principles that theorize on blockchain's role for SSI.

10 Jul 2025

ai-for-cybersecurity computer-science computer-vision-security

A Multi-Level Strategy for Deepfake Content Moderation under EU Regulation

University of Bayreuth Fraunhofer FIT FIM Research Center

The growing availability and use of deepfake technologies increases risks for democratic societies, e.g., for political communication on online platforms. The EU has responded with transparency obligations for providers and deployers of Artificial Intelligence (AI) systems and online platforms. This includes marking deepfakes during generation and labeling deepfakes when they are shared. However, the lack of industry and enforcement standards poses an ongoing challenge. Through a multivocal literature review, we summarize methods for marking, detecting, and labeling deepfakes and assess their effectiveness under EU regulation. Our results indicate that individual methods fail to meet regulatory and practical requirements. Therefore, we propose a multi-level strategy combining the strengths of existing methods. To account for the masses of content on online platforms, our multi-level strategy provides scalability and practicality via a simple scoring mechanism. At the same time, it is agnostic to types of deepfake technology and allows for context-specific risk weighting.

22 Jul 2024

computer-science artificial-intelligence computers-and-society

A Survey of AI Reliance

University of Zurich University of Bayreuth Fraunhofer FIT

Artificial intelligence (AI) systems have become an indispensable component of modern technology. However, research on human behavioral responses is lagging behind, i.e., the research into human reliance on AI advice (AI reliance). Current shortcomings in the literature include the unclear influences on AI reliance, lack of external validity, conflicting approaches to measuring reliance, and disregard for a change in reliance over time. Promising avenues for future research include reliance on generative AI output and reliance in multi-user situations. In conclusion, we present a morphological box that serves as a guide for research on AI reliance.

29 Apr 2021

computer-science databases information-retrieval

Template-Based Question Answering over Linked Geospatial Data

Fraunhofer FIT Cerence GmbH DATEV eG Anhalt University of Applied Science * National and Kapodistrian University of Athens

Large amounts of geospatial data have been made available recently on the linked open data cloud and the portals of many national cartographic agencies (e.g., OpenStreetMap data, administrative geographies of various countries, or land cover/land use data sets). These datasets use various geospatial vocabularies and can be queried using SPARQL or its OGC-standardized extension GeoSPARQL. In this paper, we go beyond these approaches to offer a question-answering engine for natural language questions on top of linked geospatial data sources. Our system has been implemented as re-usable components of the Frankenstein question answering architecture. We give a detailed description of the system's architecture, its underlying algorithms, and its evaluation using a set of 201 natural language questions. The set of questions is offered to the research community as a gold standard dataset for the comparative evaluation of future geospatial question answering engines.

31 Dec 2020

computer-science cryptography-and-security databases

Query Based Access Control for Linked Data

Dublin City University Vienna University of Economics and Business Fraunhofer FIT RWTH Aachen University

In recent years we have seen significant advances in the technology used to both publish and consume Linked Data. However, in order to support the next generation of ebusiness applications on top of interlinked machine readable data suitable forms of access control need to be put in place. Although a number of access control models and frameworks have been put forward, very little research has been conducted into the security implications associated with granting access to partial data or the correctness of the proposed access control mechanisms. Therefore the contributions of this paper are two fold: we propose a query rewriting algorithm which can be used to partially restrict access to SPARQL 1.1 queries and updates; and we demonstrate how a set of criteria, which was originally used to verify that an access control policy holds over different database states, can be adapted to verify the correctness of access control via query rewriting.

05 Dec 2024

adversarial-attacks agent-based-systems autonomous-vehicles

AI-based Attacker Models for Enhancing Multi-Stage Cyberattack Simulations in Smart Grids Using Co-Simulation Environments

Fraunhofer FIT RWTH Aachen University

The transition to smart grids has increased the vulnerability of electrical power systems to advanced cyber threats. To safeguard these systems, comprehensive security measures-including preventive, detective, and reactive strategies-are necessary. As part of the critical infrastructure, securing these systems is a major research focus, particularly against cyberattacks. Many methods are developed to detect anomalies and intrusions and assess the damage potential of attacks. However, these methods require large amounts of data, which are often limited or private due to security concerns. We propose a co-simulation framework that employs an autonomous agent to execute modular cyberattacks within a configurable environment, enabling reproducible and adaptable data generation. The impact of virtual attacks is compared to those in a physical lab targeting real smart grids. We also investigate the use of large language models for automating attack generation, though current models on consumer hardware are unreliable. Our approach offers a flexible, versatile source for data generation, aiding in faster prototyping and reducing development resources and time.

19 Apr 2020

computer-science computation-and-language machine-learning

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

Vrije Universiteit Amsterdam Fraunhofer FIT Insight SFI Research Centre for Data Analytics National University of Ireland Galway RWTH Aachen University

Exponential growths of social media and micro-blogging sites not only provide platforms for empowering freedom of expressions and individual voices but also enables people to express anti-social behaviour like online harassment, cyberbullying, and hate speech. Numerous works have been proposed to utilize these data for social and anti-social behaviours analysis, document characterization, and sentiment analysis by predicting the contexts mostly for highly resourced languages such as English. However, there are languages that are under-resources, e.g., South Asian languages like Bengali, Tamil, Assamese, Telugu that lack of computational resources for the NLP tasks. In this paper, we provide several classification benchmarks for Bengali, an under-resourced language. We prepared three datasets of expressing hate, commonly used topics, and opinions for hate speech detection, document classification, and sentiment analysis, respectively. We built the largest Bengali word embedding models to date based on 250 million articles, which we call BengFastText. We perform three different experiments, covering document classification, sentiment analysis, and hate speech detection. We incorporate word embeddings into a Multichannel Convolutional-LSTM (MConv-LSTM) network for predicting different types of hate speech, document classification, and sentiment analysis. Experiments demonstrate that BengFastText can capture the semantics of words from respective contexts correctly. Evaluations against several baseline embedding models, e.g., Word2Vec and GloVe yield up to 92.30%, 82.25%, and 90.45% F1-scores in case of document classification, sentiment analysis, and hate speech detection, respectively during 5-fold cross-validation tests.

09 Apr 2024

computer-science databases human-ai-interaction

PM4Py.LLM: a Comprehensive Module for Implementing PM on LLMs

Fraunhofer FIT RWTH Aachen University

pm4py is a process mining library for Python implementing several process mining (PM) artifacts and algorithms. It also offers methods to integrate PM with large language models (LLMs). This paper examines how the current paradigms of PM on LLM are implemented in pm4py, identifying challenges such as privacy, hallucinations, and the context window limit.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Towards Systematic Specification and Verification of Fairness Requirements: A Position Paper

Graph-based Model of Smart Grid Architectures

PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks

Implications of the AI Act for Non-Discrimination Law and Algorithmic Fairness

DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language

Evaluating Large Language Models in Process Mining: Capabilities, Benchmarks, and Evaluation Strategies

PowerDuck: A GOOSE Data Set of Cyberattacks in Substations

Challenges of Anomaly Detection in the Object-Centric Setting: Dimensions and the Role of Domain Knowledge

Revealing Inherent Concurrency in Event Data: A Partial Order Approach to Process Discovery

CollaFuse: Collaborative Diffusion Models

Enhancing SCADA Security: Developing a Host-Based Intrusion Detection System to Safeguard Against Cyberattacks

Leveraging Large Language Models (LLMs) for Process Mining (Technical Report)

Designing a Framework for Digital KYC Processes Built on Blockchain-Based Self-Sovereign Identity

A Multi-Level Strategy for Deepfake Content Moderation under EU Regulation

A Survey of AI Reliance

Template-Based Question Answering over Linked Geospatial Data

Query Based Access Control for Linked Data

AI-based Attacker Models for Enhancing Multi-Stage Cyberattack Simulations in Smart Grids Using Co-Simulation Environments

Classification Benchmarks for Under-resourced Bengali Language based on Multichannel Convolutional-LSTM Network

PM4Py.LLM: a Comprehensive Module for Implementing PM on LLMs

Events

AI for Law

Personalize Your Feed