Centre for Biotechnology and Bioengineering (CeBiB)
Protein engineering is experiencing a paradigmatic shift through the integration of geometric deep learning into computational design workflows. While traditional strategies, such as rational design and directed evolution, have enabled relevant advances, they remain limited by the complexity of sequence space and the cost of experimental validation. Geometric deep learning addresses these limitations by operating on non-Euclidean domains, capturing spatial, topological, and physicochemical features essential to protein function. This perspective outlines the current applications of GDL across stability prediction, functional annotation, molecular interaction modeling, and de novo protein design. We highlight recent methodological advances in model generalization, interpretability, and robustness, particularly under data-scarce conditions. A unified framework is proposed that integrates GDL with explainable AI and structure-based validation to support transparent, autonomous design. As GDL converges with generative modeling and high-throughput experimentation, it is emerging as a central technology in next-generation protein engineering and synthetic biology.
We study how the application of injective morphisms affects the number rr of equal-letter runs in the Burrows-Wheeler Transform (BWT). This parameter has emerged as a key repetitiveness measure in compressed indexing. We focus on the notion of BWT-run sensitivity after application of an injective morphism. For binary alphabets, we characterize the class of morphisms that preserve the number of BWT-runs up to a bounded additive increase, by showing that it coincides with the known class of primitivity-preserving morphisms, which are those that map primitive words to primitive words. We further prove that deciding whether a given binary morphism has bounded BWT-run sensitivity is possible in polynomial time with respect to the total length of the images of the two letters. Additionally, we explore new structural and combinatorial properties of synchronizing and recognizable morphisms. These results establish new connections between BWT-based compressibility, code theory, and symbolic dynamics.
The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction of the notion of string attractor [Kempa and Prezza, STOC 2018] and by the results showing the relationship between attractors and other measures of compressibility. When the input data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy often offers an even richer source for compression. However, systematic studies on repetitiveness measures for two-dimensional strings are still scarce. In this paper we extend to two or more dimensions the main measures of complexity introduced for one-dimensional strings. We distinguish between the measures δ\delta and γ\gamma, defined in terms of the substrings of the input, and the measures gg, grlg_{rl}, and bb, which are based on copy-paste mechanisms. We study the properties and mutual relationships between these two classes and we show that the two classes become incomparable for dd-dimensional inputs as soon as d2d\geq 2. Moreover, we show that our grammar-based representation of a dd-dimensional string of size NN enables direct access to any symbol in O(logN)O(\log N) time. We also compare our measures for two-dimensional strings with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024] and provide some insights for the design of future effective two-dimensional compressors.
Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. However, when the data are structured in a non-linear way, like in the context of two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In the paper we introduce extensions of repetitiveness measures to general two-dimensional strings. In particular, we propose a new extension of the measures δ\delta and γ\gamma, diverging from previous square based definitions proposed in [Carfagna and Manzini, SPIRE 2023]. We further consider generalizations of macro schemes and straight line programs for the 2D setting and show that, in contrast to what happens on strings, 2D macro schemes and 2D SLPs can be both asymptotically smaller than δ\delta and γ\gamma. The results of the paper can be easily extended to dd-dimensional strings with d>2d > 2.
There are no more papers matching your filters at the moment.