Research Institute AImotion Bavaria
Memory efficiency is crucial in training deep learning networks on resource-restricted devices. During backpropagation, forward tensors are used to calculate gradients. Despite the option of keeping those dependencies in memory until they are reused in backpropagation, some forward tensors can be discarded and recomputed later from saved tensors, so-called checkpoints. This allows, in particular, for resource-constrained heterogeneous environments to make use of all available compute devices. Unfortunately, the definition of these checkpoints is a non-trivial problem and poses a challenge to the programmer - improper or excessive recomputations negate the benefit of checkpointing. In this article, we present XEngine, an approach that schedules network operators to heterogeneous devices in low memory environments by determining checkpoints and recomputations of tensors. Our approach selects suitable resources per timestep and operator and optimizes the end-to-end time for neural networks taking the memory limitation of each device into account. For this, we formulate a mixed-integer quadratic program (MIQP) to schedule operators of deep learning networks on heterogeneous systems. We compare our MIQP solver XEngine against Checkmate, a mixed-integer linear programming (MILP) approach that solves recomputation on a single device. Our solver finds solutions that are up to 22.5 % faster than the fastest Checkmate schedule in which the network is computed exclusively on a single device. We also find valid schedules for networks making use of both central processing units and graphics processing units if memory limitations do not allow scheduling exclusively to the graphics processing unit.
Background: Contract-based Design (CbD) is a valuable methodology for software design that allows annotation of code and architectural components with contracts, thereby enhancing clarity and reliability in software development. It establishes rules that outline the behaviour of software components and their interfaces and interactions. This modular approach enables the design process to be segmented into smaller, independently developed, tested, and verified system components, ultimately leading to more robust and dependable software. Aim: Despite the significance and well-established theoretical background of CbD, there is a need for a comprehensive systematic mapping study for reliable software systems. Our study provides an evidence-based overview of a method and demonstrates its practical feasibility. Method: To conduct this study, we systematically searched three different databases using specially formulated queries, which initially yielded 1,221 primary studies. After voting, we focused on 288 primary studies for more detailed analysis. Finally, a collaborative review allowed us to gather relevant evidence and information to address our research questions. Results: Our findings suggest potential avenues for future research trajectories in CbD, emphasising its role in improving the dependability of software systems. We highlight maturity levels across different domains and identify areas that may benefit from further research. Conclusion: Although CbD is a well-established software design approach, a more comprehensive literature review is needed to clarify its theoretical state about dependable systems. Our study addresses this gap by providing a detailed overview of CbD from various perspectives, identifying key gaps, and suggesting future research directions.
In the last two decades, the popularity of self-adaptive systems in the field of software and systems engineering has drastically increased. However, despite the extensive work on self-adaptive systems, the literature still lacks a common agreement on the definition of these systems. To this day, the notion of self-adaptive systems is mainly used intuitively without a precise understanding of the terminology. Using terminology only by intuition does not suffice, especially in engineering and science, where a more rigorous definition is necessary. In this paper, we investigate the existing formal definitions of self-adaptive systems and how these systems are characterised across the literature. Additionally, we analyse and summarise the limitations of the existing formal definitions in order to understand why none of the existing formal definitions is used more broadly by the community. To achieve this, we have conducted a systematic literature review in which we have analysed over 1400 papers related to self-adaptive systems. Concretely, from an initial pool of 1493 papers, we have selected 314 relevant papers, which resulted in nine primary studies whose primary objective was to define self-adaptive systems formally. Our systematic review reveals that although there has been an increasing interest in self-adaptive systems over the years, there is a scarcity of efforts to define these systems formally. Finally, as part of this paper, based on the analysed primary studies, we also elicit requirements and set a foundation for a potential (formal) definition in the future that is accepted by the community on a broader range.
Background: Due to their diversity, complexity, and above all importance, safety-critical and dependable systems must be developed with special diligence. Criticality increases as these systems likely contain artificial intelligence (AI) components known for their uncertainty. As software and reference architectures form the backbone of any successful system, including safety-critical dependable systems with learning-enabled components, choosing the suitable architecture that guarantees safety despite uncertainties is of great eminence. Aim: We aim to provide the missing overview of all existing architectures, their contribution to safety, and their level of maturity in AI-based safety-critical systems. Method: To achieve this aim, we report a systematic mapping study. From a set of 1,639 primary studies, we selected 38 relevant studies dealing with safety assurance through software architecture in AI-based safety-critical systems. The selected studies were then examined using various criteria to answer the research questions and identify gaps in this area of research. Results: Our findings showed which architectures have been proposed and to what extent they have been implemented. Furthermore, we identified gaps in different application areas of those systems and explained these gaps with various arguments. Conclusion: As the AI trend continues to grow, the system complexity will inevitably increase, too. To ensure the lasting safety of the systems, we provide an overview of the state of the art, intending to identify best practices and research gaps and direct future research more focused.
There are no more papers matching your filters at the moment.