alphaXiv

Centre for Biotechnology and Bioengineering (CeBiB)

19 Jun 2025

quantitative-methods quantitative-biology

Geometric deep learning assists protein engineering. Opportunities and Challenges

Universidad de Chile Leibniz Institute of Plant Biochemistry Centre for Biotechnology and Bioengineering (CeBiB)Universidad de Magallanes Center for Mathematical Modeling, CMM Centro Asistencial de Docencia e Investigación, CADI

Protein engineering is experiencing a paradigmatic shift through the integration of geometric deep learning into computational design workflows. While traditional strategies, such as rational design and directed evolution, have enabled relevant advances, they remain limited by the complexity of sequence space and the cost of experimental validation. Geometric deep learning addresses these limitations by operating on non-Euclidean domains, capturing spatial, topological, and physicochemical features essential to protein function. This perspective outlines the current applications of GDL across stability prediction, functional annotation, molecular interaction modeling, and de novo protein design. We highlight recent methodological advances in model generalization, interpretability, and robustness, particularly under data-scarce conditions. A unified framework is proposed that integrates GDL with explainable AI and structure-based validation to support transparent, autonomous design. As GDL converges with generative modeling and high-throughput experimentation, it is emerging as a central technology in next-generation protein engineering and synthetic biology.

24 Apr 2025

computer-science discrete-mathematics data-structures-and-algorithms

Morphisms and BWT-run Sensitivity

University of Palermo University of Chile Centre for Biotechnology and Bioengineering (CeBiB)

We study how the application of injective morphisms affects the number

r

of equal-letter runs in the Burrows-Wheeler Transform (BWT). This parameter has emerged as a key repetitiveness measure in compressed indexing. We focus on the notion of BWT-run sensitivity after application of an injective morphism. For binary alphabets, we characterize the class of morphisms that preserve the number of BWT-runs up to a bounded additive increase, by showing that it coincides with the known class of primitivity-preserving morphisms, which are those that map primitive words to primitive words. We further prove that deciding whether a given binary morphism has bounded BWT-run sensitivity is possible in polynomial time with respect to the total length of the images of the two letters. Additionally, we explore new structural and combinatorial properties of synchronizing and recognizable morphisms. These results establish new connections between BWT-based compressibility, code theory, and symbolic dynamics.

15 May 2025

computer-science data-structures-and-algorithms

Generalization of Repetitiveness Measures for Two-Dimensional Strings

University of Pisa University of Palermo University of Chile Centre for Biotechnology and Bioengineering (CeBiB)

The problem of detecting and measuring the repetitiveness of one-dimensional strings has been extensively studied in data compression and text indexing. Our understanding of these issues has been significantly improved by the introduction of the notion of string attractor [Kempa and Prezza, STOC 2018] and by the results showing the relationship between attractors and other measures of compressibility. When the input data are structured in a non-linear way, as in two-dimensional strings, inherent redundancy often offers an even richer source for compression. However, systematic studies on repetitiveness measures for two-dimensional strings are still scarce. In this paper we extend to two or more dimensions the main measures of complexity introduced for one-dimensional strings. We distinguish between the measures

\delta

and

\gamma

, defined in terms of the substrings of the input, and the measures

g

g_{rl}

, and

b

, which are based on copy-paste mechanisms. We study the properties and mutual relationships between these two classes and we show that the two classes become incomparable for

d

-dimensional inputs as soon as

d\geq 2

. Moreover, we show that our grammar-based representation of a

d

-dimensional string of size

N

enables direct access to any symbol in

O(\log N)

time. We also compare our measures for two-dimensional strings with the 2D Block Tree data structure [Brisaboa et al., Computer J., 2024] and provide some insights for the design of future effective two-dimensional compressors.

10 Apr 2024

computer-science discrete-mathematics data-structures-and-algorithms

Exploring Repetitiveness Measures for Two-Dimensional Strings

Università di Palermo University of Chile Centre for Biotechnology and Bioengineering (CeBiB)

Detecting and measuring repetitiveness of strings is a problem that has been extensively studied in data compression and text indexing. However, when the data are structured in a non-linear way, like in the context of two-dimensional strings, inherent redundancy offers a rich source for compression, yet systematic studies on repetitiveness measures are still lacking. In the paper we introduce extensions of repetitiveness measures to general two-dimensional strings. In particular, we propose a new extension of the measures

\delta

and

\gamma

, diverging from previous square based definitions proposed in [Carfagna and Manzini, SPIRE 2023]. We further consider generalizations of macro schemes and straight line programs for the 2D setting and show that, in contrast to what happens on strings, 2D macro schemes and 2D SLPs can be both asymptotically smaller than

\delta

and

\gamma

. The results of the paper can be easily extended to

d

-dimensional strings with

d > 2

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback