Nostrum Biodiscovery S.L.
GeoDirDock: Guiding Docking Along Geodesic Paths
This work introduces GeoDirDock (GDD), a novel approach to molecular docking that enhances the accuracy and physical plausibility of ligand docking predictions. GDD guides the denoising process of a diffusion model along geodesic paths within multiple spaces representing translational, rotational, and torsional degrees of freedom. Our method leverages expert knowledge to direct the generative modeling process, specifically targeting desired protein-ligand interaction regions. We demonstrate that GDD significantly outperforms existing blind docking methods in terms of RMSD accuracy and physicochemical pose realism. Our results indicate that incorporating domain expertise into the diffusion process leads to more biologically relevant docking predictions. Additionally, we explore the potential of GDD for lead optimization in drug discovery through angle transfer in maximal common substructure (MCS) docking, showcasing its capability to predict ligand orientations for chemically similar compounds accurately.
View blog
Resources
sHGCN: Simplified hyperbolic graph convolutional neural networks
Hyperbolic geometry has emerged as a powerful tool for modeling complex, structured data, particularly where hierarchical or tree-like relationships are present. By enabling embeddings with lower distortion, hyperbolic neural networks offer promising alternatives to Euclidean-based models for capturing intricate data structures. Despite these advantages, they often face performance challenges, particularly in computational efficiency and tasks requiring high precision. In this work, we address these limitations by simplifying key operations within hyperbolic neural networks, achieving notable improvements in both runtime and performance. Our findings demonstrate that streamlined hyperbolic operations can lead to substantial gains in computational speed and predictive accuracy, making hyperbolic neural networks a more viable choice for a broader range of applications.
View blog
Resources
Are Protein Language Models Compute Optimal?
While protein language models (pLMs) have transformed biological research, the scaling laws governing their improvement remain underexplored. By adapting methodologies from NLP scaling laws, we investigated the optimal ratio between model parameters and training tokens within a fixed compute budget. Our study reveals that pLM sizes scale sublinearly with compute budget, showing diminishing returns in performance as model size increases, and we identify a performance plateau in training loss comparable to the one found in relevant works in the field. Our findings suggest that widely-used pLMs might not be compute-optimal, indicating that larger models could achieve convergence more efficiently. Training a 35M model on a reduced token set, we attained perplexity results comparable to larger models like ESM-2 (15B) and xTrimoPGLM (100B) with a single dataset pass. This work paves the way towards more compute-efficient pLMs, democratizing their training and practical application in computational biology.
View blog
Resources
De Novo Design of SIK3 Inhibitors via Feedback-Driven Fine-Tuning of Seq2Seq-VAE
Alzheimers disease (AD), a progressive neuro-degenerative disorder, currently lacks effective therapeutic strategies that can modify disease progression. Recent studies have highlighted the circadian rhythm critical role in AD pathophysiology, implicating circadian clock kinases, such as the Salt-Inducible Kinase 3 (SIK3), as promising therapeutic target. Generative AI models have surpassed traditional methods of drug discovery, untapping the vast unexplored chemical space of drug-like molecules. We present a sequence-to-sequence Variational Autoencoder (Seq2Seq-VAE) model guided by an Active Learning (AL) approach to optimize molecular generation. Our pipeline iteratively guided a pre-trained Seq2Seq-VAE model towards the pharmacological landscape relevant to SIK3 using a two-step framework, an inner loop that iteratively improves physiochemical properties profile, drug likeliness and synthesizability, followed by an outer loop that steer the latent space towards high-affinity ligands for SIK3. Our approach introduces feedback-driven optimization without requiring large labeled datasets, making it particularly suited for early-stage drug discovery in under-explored therapeutic targets. Our results demonstrate the models convergence toward SIK3-specific small molecules with desired properties and high binding affinity. This work highlights the use of generative AI combined with AL for rational drug discovery that can be extended to other protein targets with minimal modifications, offering a scalable solution to the molecular design bottleneck in drug design.
View blog
Resources
Scoreformer: A Surrogate Model For Large-Scale Prediction of Docking Scores
In this study, we present ScoreFormer, a novel graph transformer model designed to accurately predict molecular docking scores, thereby optimizing high-throughput virtual screening (HTVS) in drug discovery. The architecture integrates Principal Neighborhood Aggregation (PNA) and Learnable Random Walk Positional Encodings (LRWPE), enhancing the model's ability to understand complex molecular structures and their relationship with their respective docking scores. This approach significantly surpasses traditional HTVS methods and recent Graph Neural Network (GNN) models in both recovery and efficiency due to a wider coverage of the chemical space and enhanced performance. Our results demonstrate that ScoreFormer achieves competitive performance in docking score prediction and offers a substantial 1.65-fold reduction in inference time compared to existing models. We evaluated ScoreFormer across multiple datasets under various conditions, confirming its robustness and reliability in identifying potential drug candidates rapidly.
View blog
Resources
Optimizing Drug Design by Merging Generative AI With Active Learning Frameworks
Traditional drug discovery programs are being transformed by the advent of machine learning methods. Among these, Generative AI methods (GM) have gained attention due to their ability to design new molecules and enhance specific properties of existing ones. However, current GM methods have limitations, such as low affinity towards the target, unknown ADME/PK properties, or the lack of synthetic tractability. To improve the applicability domain of GM methods, we have developed a workflow based on a variational autoencoder coupled with active learning steps. The designed GM workflow iteratively learns from molecular metrics, including drug likeliness, synthesizability, similarity, and docking scores. In addition, we also included a hierarchical set of criteria based on advanced molecular modeling simulations during a final selection step. We tested our GM workflow on two model systems, CDK2 and KRAS. In both cases, our model generated chemically viable molecules with a high predicted affinity toward the targets. Particularly, the proportion of high-affinity molecules inferred by our GM workflow was significantly greater than that in the training data. Notably, we also uncovered novel scaffolds significantly dissimilar to those known for each target. These results highlight the potential of our GM workflow to explore novel chemical space for specific targets, thereby opening up new possibilities for drug discovery endeavors.
View blog
Resources
There are no more papers matching your filters at the moment.