A novel method for constructing a nonlinear fractal histopolation function associated with a given histogram is introduced in this paper. In contrast to classical fractal interpolation methods, which produce continuous and interpolatory functions, the proposed approach constructs a bounded, Riemann integrable function that is not necessarily continuous but preserves the area of a given histogram. An iterated function system based on Rakotch contractions- a generalisation of Banach contractions- is utilised, thereby extending the theoretical framework for fractal histopolation. Unlike previous formulations, the proposed construction of nonlinear fractal functions allows vertical scaling factors greater than one. The conditions for the nonlinear fractal function to be a solution for the histopolation problem are derived.
High-performance computing (HPC) systems are becoming increasingly water-intensive due to their reliance on water-based cooling and the energy used in power generation. However, the water footprint of HPC remains relatively underexplored-especially in contrast to the growing focus on carbon emissions. In this paper, we present ThirstyFLOPS - a comprehensive water footprint analysis framework for HPC systems. Our approach incorporates region-specific metrics, including Water Usage Effectiveness, Power Usage Effectiveness, and Energy Water Factor, to quantify water consumption using real-world data. Using four representative HPC systems - Marconi, Fugaku, Polaris, and Frontier - as examples, we provide implications for HPC system planning and management. We explore the impact of regional water scarcity and nuclear-based energy strategies on HPC sustainability. Our findings aim to advance the development of water-aware, environmentally responsible computing infrastructures.
Despite being proposed as early as 1959, COBOL (Common Business-Oriented
Language) still predominantly acts as an integral part of the majority of
operations of several financial, banking, and governmental organizations. To
support the inevitable modernization and maintenance of legacy systems written
in COBOL, it is essential for organizations, researchers, and developers to
understand the nature and source code of COBOL programs. However, to the best
of our knowledge, we are unaware of any dataset that provides data on COBOL
software projects, motivating the need for the dataset. Thus, to aid empirical
research on comprehending COBOL in open-source repositories, we constructed a
dataset of 84 COBOL repositories mined from GitHub, containing rich metadata on
the development cycle of the projects. We envision that researchers can utilize
our dataset to study COBOL projects' evolution, code properties and develop
tools to support their development. Our dataset also provides 1255 COBOL files
present inside the mined repositories. The dataset and artifacts are available
at https://doi.org/10.5281/zenodo.7968845.
This work from Banaras Hindu University and Indian Institute of Technology Tirupati demonstrates that topological phase transitions in a Hermitian system systematically induce corresponding changes in the knot topology of a derived non-Hermitian Hamiltonian's complex eigenvalues. It introduces a "first-order knot transition" characterized by a discrete jump in eigenvalues at the transition point, occurring without the presence of Exceptional Points.
Generative Adversarial Networks (GANs) have swiftly evolved to imitate
increasingly complex image distributions. However, majority of the developments
focus on performance of GANs on balanced datasets. We find that the existing
GANs and their training regimes which work well on balanced datasets fail to be
effective in case of imbalanced (i.e. long-tailed) datasets. In this work we
introduce a novel theoretically motivated Class Balancing regularizer for
training GANs. Our regularizer makes use of the knowledge from a pre-trained
classifier to ensure balanced learning of all the classes in the dataset. This
is achieved via modelling the effective class frequency based on the
exponential forgetting observed in neural networks and encouraging the GAN to
focus on underrepresented classes. We demonstrate the utility of our
regularizer in learning representations for long-tailed distributions via
achieving better performance than existing approaches over multiple datasets.
Specifically, when applied to an unconditional GAN, it improves the FID from
13.03 to 9.01 on the long-tailed iNaturalist-2019 dataset.
Public transport administrators rely on efficient algorithms for various
problems that arise in public transport networks. In particular, our study
focused on designing linear-time algorithms for two fundamental path problems:
the earliest arrival time (\textsc{eat}) and the fastest path duration
(\textsc{fpd}) on public transportation data. We conduct a comparative analysis
with state-of-the-art algorithms. The results are quite promising, indicating
substantial efficiency improvements. Specifically, the fastest path problem
shows a remarkable 34-fold speedup, while the earliest arrival time problem
exhibits an even more impressive 183-fold speedup. These findings highlight the
effectiveness of our algorithms to solve \textsc{eat} and \textsc{fpd} problems
in public transport, and eventually help public administrators to enrich the
urban transport experience.
This work introduces UnityAI-Guard, a framework for binary toxicity classification targeting low-resource Indian languages. While existing systems predominantly cater to high-resource languages, UnityAI-Guard addresses this critical gap by developing state-of-the-art models for identifying toxic content across diverse Brahmic/Indic scripts. Our approach achieves an impressive average F1-score of 84.23% across seven languages, leveraging a dataset of 567k training instances and 30k manually verified test instances. By advancing multilingual content moderation for linguistically diverse regions, UnityAI-Guard also provides public API access to foster broader adoption and application.
Knowledge graphs (KGs) have achieved significant attention in recent years, particularly in the area of the Semantic Web as well as gaining popularity in other application domains such as data mining and search engines. Simultaneously, there has been enormous progress in the development of different types of heterogeneous hardware, impacting the way KGs are processed. The aim of this paper is to provide a systematic literature review of knowledge graph hardware acceleration. For this, we present a classification of the primary areas in knowledge graph technology that harnesses different hardware units for accelerating certain knowledge graph functionalities. We then extensively describe respective works, focusing on how KG related schemes harness modern hardware accelerators. Based on our review, we identify various research gaps and future exploratory directions that are anticipated to be of significant value both for academics and industry practitioners.
Driving a quantum system periodically in time can profoundly alter its long-time correlations and give rise to exotic quantum states of matter. The complexity of the combination of many-body correlations and dynamic manipulations has the potential to uncover a whole field of new phenomena, but the theoretical and numerical understanding becomes extremely difficult. We now propose a promising numerical method by generalizing the density matrix renormalization group to a superposition of Fourier components of periodically driven many-body systems using Floquet theory. With this method we can study the full time-dependent quantum solution in a large parameter range for all evolution times, beyond the commonly used high-frequency approximations. Numerical results are presented for the isotropic Heisenberg antiferromagnetic spin-1/2 chain under both local(edge) and global driving for spin-spin correlations and temporal fluctuations. As the frequency is lowered, we demonstrate that more and more Fourier components become relevant and determine strong length- and frequency-dependent changes of the quantum correlations that cannot be described by effective static models.
While quantum circuits built from two-particle dual-unitary (maximally entangled) operators serve as minimal models of typically nonintegrable many-body systems, the construction and characterization of dual-unitary operators themselves are only partially understood. A nonlinear map on the space of unitary operators was proposed in PRL.~125, 070501 (2020) that results in operators being arbitrarily close to dual unitaries. Here we study the map analytically for the two-qubit case describing the basins of attraction, fixed points, and rates of approach to dual unitaries. A subset of dual-unitary operators having maximum entangling power are 2-unitary operators or perfect tensors, and are equivalent to four-party absolutely maximally entangled states. It is known that they only exist if the local dimension is larger than d=2. We use the nonlinear map, and introduce stochastic variants of it, to construct explicit examples of new dual and 2-unitary operators. A necessary criterion for their local unitary equivalence to distinguish classes is also introduced and used to display various concrete results and a conjecture in d=3. It is known that orthogonal Latin squares provide a ``classical combinatorial design" for constructing permutations that are 2-unitary. We extend the underlying design from classical to genuine quantum ones for general dual-unitary operators and give an example of what might be the smallest sized genuinely quantum design of a 2-unitary in d=4.
Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, valid source code, unlike natural language, follows a strict structure and pattern governed by the underlying grammar of the programming language. Current LLMs do not exploit this property of the source code as they treat code like a sequence of tokens and overlook key structural and semantic properties of code that can be extracted from code-views like the Control Flow Graph (CFG), Data Flow Graph (DFG), Abstract Syntax Tree (AST), etc. Unfortunately, the process of generating and integrating code-views for every programming language is cumbersome and time consuming. To overcome this barrier, we propose our tool COMEX - a framework that allows researchers and developers to create and combine multiple code-views which can be used by machine learning (ML) models for various SE tasks. Some salient features of our tool are: (i) it works directly on source code (which need not be compilable), (ii) it currently supports Java and C#, (iii) it can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural analysis, and (iv) it is easily extendable to other languages as it is built on tree-sitter - a widely used incremental parser that supports over 40 languages. We believe this easy-to-use code-view generation and customization tool will give impetus to research in source code representation learning methods and ML4SE.
Tool: this https URL - GitHub: this https URL - Demo: this https URL
In the rapidly evolving landscape of modern data-driven technologies, software relies on large datasets and constant data center operations using various database systems to support computation-intensive tasks. As energy consumption in software systems becomes a growing concern, selecting the right database from energy-efficiency perspective is also critical. To address this, we introduce \textbf{\textit{DBJoules}}, a tool that measures the energy consumption of activities in database systems. \textit{DBJoules} supports energy measurement of CRUD operations for four popular databases. Through evaluations on two widely-used datasets, we identify disparities of 7\% to 38\% in the energy consumption of these databases. Hence, the goal is to raise developer awareness about the effect of running queries in different databases from an energy consumption perspective, enabling them to select appropriate database for sustainable usage. The tool's demonstration is available at \url{this https URL} and related artifacts at \url{this https URL}.
Readme in GitHub repositories serves as a preliminary source of information, and thus helps developers in understanding about the projects, for reuse or extension. Different types of contextual and structural content, which we refer to as categories of the content and features in the content respectively, are present in readme files, and could determine the extent of comprehension about project. Consequently, the structural and contextual aspects of the content could impact the project popularity. Studying the correlation between the content and project popularity could help in focusing on the aspects that could improve popularity, while designing the readme files. However, existing studies explore the categories of content and types of features in readme files, and do not explore their usefulness towards project popularity. Hence, we present an empirical study to understand correlation between readme file content and project popularity. We perform the study on 1950 readme files of public GitHub projects, spanning across ten programming languages, and observe that readme files in majority of the popular projects are well organised using lists and images, and comprise links to external sources. Also, repositories with readme files containing contribution guidelines and references were observed to be associated with higher popularity.
In this work, we present a detailed thermodynamic analysis of a bound quantum system: the Morse oscillator within the framework of Tsallis nonextensive statistics. Using the property of the bound spectrum (upper bound) of the Morse potential, limited by the bond dissociation energy, we analytically derive the generalized partition function. We present results for both the high- and low-temperature limits. We propose the effective number of accessible states as a measure of nonextensivity. The calculation shows that the nonextensive framework further restricts the number of accessible states. We also derive the generalized internal energy and entropy and examine their dependence on temperature and the nonextensivity parameter q. Numerical results confirm the strong effect of nonextensive behavior in the low-temperature regime (precisely low to moderate temperature), where the ratio of generalized internal energy and internal energy calculated from the Boltzmann Gibbs (BG) formula develops a nontrivial dip structure for q < 1 . Moreover, the generalized specific heat shows the Schottky-type anomaly. We extend our study by deriving the specific heat of solids with BG and Tsallis statistics using the anharmonic energy levels of the Morse oscillator. This study suggests that the Morse oscillator is a solvable and physically meaningful testing ground for exploring the thermodynamics of quantum systems driven by nonextensive statistics, with implications for the vibrational properties of the non-equilibrium molecular thermodynamics (especially diatomic molecules).
Fluorinated compounds, often referred to as forever chemicals, are critical
in various steps of semiconductor fabrication like lithography, etching,
chamber cleaning, and others. Forever chemical emissions can exhibit global
warming potentials thousands of times greater than carbon dioxide and persist
in the atmosphere for millennia. Despite their severe impact, most
sustainability works in computer systems have focused on carbon emissions
alone. We address this gap by introducing ForgetMeNot, a modeling tool that
quantifies fluorinated compound emissions by integrating fabrication
facility-specific practices and hardware specifications, and validate its
accuracy using real-world emission data from fabrication facilities. We show
how ForgetMeNot can enable fabrication facilities to optimize design and
material usage decisions for emission reduction and provide researchers with a
methodology to calibrate emission estimates for hardware designs. When
ForgetMeNot is applied to analyze emissions for manufacturing CPUs, DRAM, and
storage, it illustrates how hardware generations, lithography techniques, and
capacities impact fluorinated compound emissions. Finally, we demonstrate how
datacenter operators can assemble low-emission servers while balancing
performance demands. By factoring in fluorinated emissions into manufacturing
decisions, ForgetMeNot paves the way for building more sustainable systems.
Link prediction is one of the central problems in graph mining. However,
recent studies highlight the importance of higher-order network analysis, where
complex structures called motifs are the first-class citizens. We first show
that existing link prediction schemes fail to effectively predict motifs. To
alleviate this, we establish a general motif prediction problem and we propose
several heuristics that assess the chances for a specified motif to appear. To
make the scores realistic, our heuristics consider - among others -
correlations between links, i.e., the potential impact of some arriving links
on the appearance of other links in a given motif. Finally, for highest
accuracy, we develop a graph neural network (GNN) architecture for motif
prediction. Our architecture offers vertex features and sampling schemes that
capture the rich structural properties of motifs. While our heuristics are fast
and do not need any training, GNNs ensure highest accuracy of predicting
motifs, both for dense (e.g., k-cliques) and for sparse ones (e.g., k-stars).
We consistently outperform the best available competitor by more than 10% on
average and up to 32% in area under the curve. Importantly, the advantages of
our approach over schemes based on uncorrelated link prediction increase with
the increasing motif size and complexity. We also successfully apply our
architecture for predicting more arbitrary clusters and communities,
illustrating its potential for graph mining beyond motif analysis.
We propose a novel variant of the UCB algorithm (referred to as Efficient-UCB-Variance (EUCBV)) for minimizing cumulative regret in the stochastic multi-armed bandit (MAB) setting. EUCBV incorporates the arm elimination strategy proposed in UCB-Improved \citep{auer2010ucb}, while taking into account the variance estimates to compute the arms' confidence bounds, similar to UCBV \citep{audibert2009exploration}. Through a theoretical analysis we establish that EUCBV incurs a \emph{gap-dependent} regret bound of {\scriptsize O(ΔKσmax2log(TΔ2/K))} after T trials, where Δ is the minimal gap between optimal and sub-optimal arms; the above bound is an improvement over that of existing state-of-the-art UCB algorithms (such as UCB1, UCB-Improved, UCBV, MOSS). Further, EUCBV incurs a \emph{gap-independent} regret bound of {\scriptsize O(KT)} which is an improvement over that of UCB1, UCBV and UCB-Improved, while being comparable with that of MOSS and OCUCB. Through an extensive numerical study we show that EUCBV significantly outperforms the popular UCB variants (like MOSS, OCUCB, etc.) as well as Thompson sampling and Bayes-UCB algorithms.
The dynamics of quantum many-body systems in the chaotic regime are of particular interest due to the associated phenomena of information scrambling and entanglement generation within the system. While these systems are typically intractable using traditional numerical methods, an effective framework can be implemented based on dual-unitary circuits which have emerged as a minimal model for maximally chaotic dynamics. In this work, we investigate how individual two-body operators influence the global dynamics of circuits composed of dual-unitaries. We study their effect on entanglement generation while examining it from both bipartite and multipartite perspectives. Here we also highlight the significant role of local unitaries in the dynamics when paired with operators from the dual-unitary class, showing that systems with identical entangling power can exhibit a range of differing entanglement growth rates. Furthermore, we present calculations establishing time-step-dependent lower bounds, which depend on both the initial state and the entangling power of the constituent operators. Finally, we find that time-evolving an initial state composed of pair products generates a state with nearly maximal multipartite entanglement content, approaching the bounds established by Absolutely Maximally Entangled (AME) states.
The increasing vulnerability of electrical distribution systems to extreme weather events and cyber threats necessitates the development of economically viable frameworks for resilience enhancement. While existing approaches focus primarily on technical resilience metrics and enhancement strategies, there remains a significant gap in establishing market-driven mechanisms that can effectively commercialize resilience features while optimizing their deployment through intelligent decision-making. Moreover, traditional optimization approaches for distribution network reconfiguration often fail to dynamically adapt to both normal and emergency conditions. This paper introduces a novel framework integrating dual-agent Proximal Policy Optimization (PPO) with market-based mechanisms, achieving an average resilience score of 0.85 0.08 over 10 test episodes. The proposed architecture leverages a dual-agent PPO scheme, where a strategic agent selects optimal DER-driven switching configurations, while a tactical agent fine-tunes individual switch states and grid preferences under budget and weather constraints. These agents interact within a custom-built dynamic simulation environment that models stochastic calamity events, budget limits, and resilience-cost trade-offs. A comprehensive reward function is designed that balances resilience enhancement objectives with market profitability (with up to 200x reward incentives, resulting in 85% of actions during calamity steps selecting configurations with 4 DERs), incorporating factors such as load recovery speed, system robustness, and customer satisfaction. Over 10 test episodes, the framework achieved a benefit-cost ratio of 0.12 0.01, demonstrating sustainable market incentives for resilience investment. This framework creates sustainable market incentives
Code review is a crucial process before deploying code to production, as it validates the code, provides suggestions for improvements, and identifies errors such as missed edge cases. In projects with regular production releases, the effort required for peer code-reviews remains high. Consequently, there has been significant interest from software engineering (SE) researchers in automating the code review process. Previous research on code review automation has typically approached the task as three independent sub-tasks: review necessity prediction, review comment generation, and code refinement. Our study attempts to (i) leverage the relationships between the sub-tasks of code review automation, by developing a multi-task model that addresses all tasks in an integrated manner, and (ii) increase model robustness on unseen data via collaborative large language model (LLM) modeling, while retaining the proprietary nature of code, by using federated learning (FL). The study explores five simple techniques for multi-task training, including two sequential methods, one parallel method, and two cumulative methods. The results indicate that sequentially training a federated LLM (FedLLM) for our code review multi-task use case is less efficient in terms of time, computation, and performance metrics, compared to training separate models for each task. Because sequential training demonstrates catastrophic forgetting, alternatively cumulative fine-tuning for multi-task training performs better than training models for individual tasks. This study highlights the need for research focused on effective fine-tuning of multi-task FedLLMs for SE tasks.
There are no more papers matching your filters at the moment.