TopoBench introduces an open-source, modular framework designed to standardize benchmarking for Topological Deep Learning (TDL) and accelerate research in the field. Empirical evaluations using the framework demonstrate that higher-order neural networks frequently outperform traditional Graph Neural Networks on tasks benefiting from complex multi-way interactions across diverse datasets.
View blogGonçalo M. Quinta's Spacetime Grand Unified Theory proposes a first-principles derivation of the entire Standard Model from the free Dirac Lagrangian, using the Cℓ(8,0) Clifford algebra to explain the existence of three particle families via triality and to derive the Standard Model gauge group, predicting a Weinberg angle of sin²(θW) = 3/8.
View blogLumberChunker introduces a novel LLM-driven method for segmenting long-form narrative documents by identifying natural semantic boundaries in text. This approach consistently outperformed traditional chunking methods in retrieval performance and led to an 88.89% accuracy in question-answering tasks within RAG pipelines, supported by the new GutenQA benchmark.
View blogThis paper presents a controlled study comparing Masked Language Modeling (MLM) and Causal Language Modeling (CLM) as pretraining objectives for text encoders across various model sizes and a fixed data budget. The research finds that while MLM generally yields stronger representations for downstream tasks, a two-stage CLM then MLM approach or continuing pretraining with MLM from a CLM checkpoint delivers optimal performance and improved fine-tuning stability.
View blogResearchers from Flashbots and the Technical University of Munich conducted the first large-scale empirical study of executed cross-chain arbitrage, quantifying its 868.64millionannualvolumeand8.65 million net profit across nine blockchains. Their analysis revealed a highly concentrated market where inventory-based strategies dominate due to bridging latency, raising concerns about vertical integration and systemic risks in decentralized finance.
View blogSaulLM-7B is presented as the first publicly available large language model specifically designed for the legal domain. It achieves state-of-the-art performance for 7-billion-parameter models on legal benchmarks by leveraging extensive legal data and a two-step training process, contributing an improved evaluation protocol, LegalBench-Instruct.
View blogTOWER is an open multilingual large language model family developed by Unbabel and Instituto de Telecomunicações, designed to excel across various translation-related tasks. It was created using a multi-stage training recipe, achieving high translation quality that surpasses other open models and often rivals closed systems like GPT-3.5 and GPT-4 on benchmarks.
View blogThis research rigorously demonstrates that General Relativistic effects, including gravitomagnetism and non-linear contributions, cannot explain observed galactic rotation curves or gravitational lensing without dark matter. The analysis shows that these relativistic effects are either geometrically inconsistent with observations or counteract the gravitational attraction required to solve the missing mass problem.
View blogThis survey details the growing application of Conformal Prediction (CP) for uncertainty quantification in Natural Language Processing (NLP), demonstrating how it provides reliable uncertainty estimates with statistical guarantees. It shows CP successfully applied across diverse NLP tasks, including text classification and natural language generation, enhancing model reliability and supporting human-AI collaboration.
View blog