University of Turku
A logical modeling framework, Strong Regulatory Graphs (SRGs), addresses the scalability challenge in analyzing large biological networks by incorporating an 'ambiguous' state for nodes with conflicting or uncertain regulatory inputs. This approach, developed at Åbo Akademi University and the University of Turku, allows for the efficient, polynomial-time decidability of phenotype attractors and provides a robust method to model complex biological dynamics, as demonstrated on a cancer signaling network.
Researchers from the University of Jyväskylä and collaborators developed a hybrid deep transfer learning and ensemble machine learning model that achieves 96.74% accuracy on an external test set for colorectal cancer histology decomposition. This approach provides a more precise foundation for imaging-based prognostic biomarkers by enhancing tissue classification compared to previous methods.
Researchers from Hong Kong University of Science and Technology and HSBC developed Temporal GraphRAG (TG-RAG), a Retrieval-Augmented Generation system that explicitly models time-sensitive, evolving knowledge using a bi-level temporal graph. This framework enables efficient incremental updates and achieves superior factual accuracy and temporal coverage on a new financial earnings call dataset compared to existing graph-RAG methods.
Researchers from Hugging Face, Harvard University, and the University of Turku demonstrated that large language models trained with up to four epochs of repeated data show negligible performance degradation, challenging the single-epoch training paradigm in data-constrained scenarios. Their new scaling laws suggest optimizing compute by prioritizing more training epochs over increased model parameters when unique data is scarce, and they show that up to 50% code augmentation can maintain natural language task performance.
333
A collaborative white paper coordinated by the Quantum Community Network comprehensively analyzes the current status and future perspectives of Quantum Artificial Intelligence, categorizing its potential into "Quantum for AI" and "AI for Quantum" applications. It proposes a strategic research and development agenda to bolster Europe's competitive position in this rapidly converging technological domain.
This survey paper from the Turku Intelligent Embedded and Robotic Systems Lab offers the first comprehensive review of sim-to-real transfer methods in deep reinforcement learning for robotics. It synthesizes existing approaches, identifies key challenges, and outlines future research directions to enable the reliable deployment of simulation-trained policies on real-world robots.
Test-time scaling (TTS) has enhanced the performance of Reasoning Models (RMs) on various tasks such as math and coding, yet its efficacy in machine translation (MT) remains underexplored. This paper investigates whether increased inference-time computation improves translation quality. We evaluate 12 RMs across a diverse suite of MT benchmarks spanning multiple domains, examining three scenarios: direct translation, forced-reasoning extrapolation, and post-editing. Our findings show that for general-purpose RMs, TTS provides limited and inconsistent benefits for direct translation, with performance quickly plateauing. However, the effectiveness of TTS is unlocked by domain-specific fine-tuning, which aligns a model's reasoning process with task requirements, leading to consistent improvements up to an optimal, self-determined reasoning depth. We also find that forcing a model to reason beyond its natural stopping point consistently degrades translation quality. In contrast, TTS proves highly effective in a post-editing context, reliably turning self-correction into a beneficial process. These results indicate that the value of inference-time computation in MT lies not in enhancing single-pass translation with general models, but in targeted applications like multi-step, self-correction workflows and in conjunction with task-specialized models.
This research from the University of Turku and ETH Zurich presents a comprehensive framework for training reinforcement learning policies for mobile robot navigation using NVIDIA Isaac Sim, demonstrating successful zero-shot transfer to real-world robots. The developed policies achieve 70-100% success rates in real-world static environments and exhibit superior dynamic obstacle avoidance compared to traditional methods in simulation.
The 136 year long optical light curve of OJ~287 is explained by a binary black hole model where the secondary is in a 12 year orbit around the primary. Impacts of the secondary on the accretion disk of the primary generate a series of optical flares which follow a quasi-Keplerian relativistic mathematical model. The orientation of the binary in space is determined from the behavior of the primary jet. Here we ask how the jet of the secondary black hole projects onto the sky plane. Assuming that the jet is initially perpendicular to the disk, and that it is ballistic, we follow its evolution after the Lorentz transformation to the observer's frame. Since the orbital speed of the secondary is of the order of one-tenth of the speed of light, the result is a change in the jet direction by more than a radian during an orbital cycle. We match the theoretical jet line with the recent 12 μ\muas-resolution RadioAstron map of OJ~287, and determine the only free parameter of the problem, the apparent speed of the jet relative to speed of light. It turns out that the Doppler factor of the jet, δ5\delta\sim5, is much lower than in the primary jet. Besides following a unique shape of the jet path, the secondary jet is also distinguished by a different spectral shape than in the primary jet. The present result on the spectral shape agrees with the huge optical flare of 2021 November 12, also arising from the secondary jet.
Researchers from the University of Turku developed a GNSS-free, vision-based localization algorithm for UAVs operating in natural environments at high altitudes, using only a monocular camera. The system localizes drones by matching real-time camera images against pre-built georeferenced satellite maps, achieving a Mean Average Error (MAE) of 15.82 meters, comparable to standard GNSS accuracy.
The High-Performance Language Technologies (HPLT) project released HPLT v2, an expanded collection of multilingual monolingual and parallel corpora, featuring 8 trillion tokens across 193 languages and over 380 million parallel sentence pairs. This resource, built from diverse web sources, enabled the training of language models that showed improved performance in linguistic tasks and generative modeling compared to previous datasets.
Data quality is crucial for training Large Language Models (LLMs). Traditional heuristic filters often miss low-quality text or mistakenly remove valuable content. In this paper, we introduce an LLM-based line-level filtering method to enhance training data quality. We use GPT-4o mini to label a 20,000-document sample from FineWeb at the line level, allowing the model to create descriptive labels for low-quality lines. These labels are grouped into nine main categories, and we train a DeBERTa-v3 classifier to scale the filtering to a 10B-token subset of FineWeb. To test the impact of our filtering, we train GPT-2 models on both the original and the filtered datasets. The results show that models trained on the filtered data achieve higher accuracy on the HellaSwag benchmark and reach their performance targets faster, even with up to 25\% less data. This demonstrates that LLM-based line-level filtering can significantly improve data quality and training efficiency for LLMs. We release our quality-annotated dataset, FinerWeb-10BT, and the codebase to support further work in this area.
Pretraining data curation is a cornerstone in Large Language Model (LLM) development, leading to growing research on quality filtering of large web corpora. From statistical quality flags to LLM-based labelling systems, datasets are divided into categories, frequently reducing to a binary: those passing the filters are deemed as valuable examples, others are discarded as useless or detrimental. However, a more detailed understanding of the contribution of different kinds of texts to model performance is still largely lacking. In this article, we present the first study utilising registers or genres - a widely used standard in corpus linguistics to model linguistic variation - to curate pretraining datasets and investigate the effect of register on the performance of LLMs. We train small generative models with register classified data and evaluate them using standard benchmarks, and show that the register of pretraining data substantially affects model performance. We uncover surprising relationships between the pretraining material and the resulting models: using the News register results in subpar performance, and on the contrary, including the Opinion class, covering texts such as reviews and opinion blogs, is highly beneficial. While a model trained on the entire unfiltered dataset outperforms those trained on datasets limited to a single register, combining well-performing registers like How-to-Instructions, Informational Description, and Opinion leads to major improvements. Furthermore, analysis of individual benchmark results reveals key differences in the strengths and drawbacks of specific register classes as pretraining data. These findings show that register is an important explainer of model variation and can facilitate more deliberate future data selection practices.
Sleep is an essential component of human physiology, contributing significantly to overall health and quality of life. Accurate sleep staging and disorder detection are crucial for assessing sleep quality. Studies in the literature have proposed PSG-based approaches and machine-learning methods utilizing single-modality signals. However, existing methods often lack multimodal, multilabel frameworks and address sleep stages and disorders classification separately. In this paper, we propose a 1D-Vision Transformer for simultaneous classification of sleep stages and sleep disorders. Our method exploits the sleep disorders' correlation with specific sleep stage patterns and performs a simultaneous identification of a sleep stage and sleep disorder. The model is trained and tested using multimodal-multilabel sensory data (including photoplethysmogram, respiratory flow, and respiratory effort signals). The proposed method shows an overall accuracy (cohen's Kappa) of 78% (0.66) for five-stage sleep classification and 74% (0.58) for sleep apnea classification. Moreover, we analyzed the encoder attention weights to clarify our models' predictions and investigate the influence different features have on the models' outputs. The result shows that identified patterns, such as respiratory troughs and peaks, make a higher contribution to the final classification process.
University of Cambridge logoUniversity of CambridgeUniversity of BernUniversity of EdinburghETH Zürich logoETH ZürichTechnische Universität DresdenUniversity of PisaStockholm University logoStockholm UniversitySorbonne Université logoSorbonne UniversitéUniversity of TurkuLeiden University logoLeiden UniversityUniversity of GenevaUniversity of BelgradeUniversity of ViennaUniversity of LeicesterUniversity of VigoUniversiteit LeidenObservatoire de ParisUniversité de LiègeINAF - Osservatorio Astrofisico di TorinoUniversity of Groningen logoUniversity of GroningenUniversity of BathLund UniversityUniversity of LausanneInstituto de Astrofísica de CanariasUniversity of AntioquiaEuropean Space AgencyUniversidad de ValparaísoUniversité de MonsELTE Eötvös Loránd UniversityUniversity of BordeauxObservatoire de la Côte d’AzurFaculdade de Ciências da Universidade de LisboaUniversity of BarcelonaMax Planck Institute for AstronomyNational Observatory of AthensUniversité de Paris-SaclayInstituto de Astrofísica de AndalucíaUniversité de Franche-ComtéINAF – Osservatorio Astronomico di RomaKatholieke Universiteit LeuvenRoyal Observatory of BelgiumSpace Research InstituteUniversité de RennesUniversity of AarhusKonkoly ObservatoryTartu ObservatoryHellenic Open UniversityARI, Zentrum für Astronomie der Universität HeidelbergCopernicus Astronomical CenterESAC, Villanueva de la CañadaAstronomical Observatory of TurinUniversité de BesançonCENTRA, Universidade de LisboaUniversité de NiceObservatoire de la Côte d'Azur, CNRSINAF – Osservatorio Astronomico di CataniaUniversit catholique de LouvainUniversit de ToulouseUniversit Libre de BruxellesINAF Osservatorio Astronomico di CapodimonteUniversit de LorraineAix-Marseille Universit",Universit de StrasbourgUniversit de LilleINAF Osservatorio Astrofisico di ArcetriINAF Osservatorio Astronomico di PadovaUniversit de MontpellierINAF Osservatorio di Astrofisica e Scienza dello Spazio di Bologna
The Gaia Galactic survey mission is designed and optimized to obtain astrometry, photometry, and spectroscopy of nearly two billion stars in our Galaxy. Yet as an all-sky multi-epoch survey, Gaia also observes several million extragalactic objects down to a magnitude of G~21 mag. Due to the nature of the Gaia onboard selection algorithms, these are mostly point-source-like objects. Using data provided by the satellite, we have identified quasar and galaxy candidates via supervised machine learning methods, and estimate their redshifts using the low resolution BP/RP spectra. We further characterise the surface brightness profiles of host galaxies of quasars and of galaxies from pre-defined input lists. Here we give an overview of the processing of extragalactic objects, describe the data products in Gaia DR3, and analyse their properties. Two integrated tables contain the main results for a high completeness, but low purity (50-70%), set of 6.6 million candidate quasars and 4.8 million candidate galaxies. We provide queries that select purer sub-samples of these containing 1.9 million probable quasars and 2.9 million probable galaxies (both 95% purity). We also use high quality BP/RP spectra of 43 thousand high probability quasars over the redshift range 0.05-4.36 to construct a composite quasar spectrum spanning restframe wavelengths from 72-100 nm.
We obtain comorbidity networks starting from medical information stored in electronic health records collected by the Wellbeing Services County of Southwest Finland (Varha). Based on the data, we associate each patient to one or more diseases and construct complex comorbidity networks associated with large patient cohorts characterized by an age interval and sex. The information about diseases in electronic health records is coded using the highest granularity present in the international classification of diseases (ICD codes) provided by the World Health Organization. We statistically validate links in each cohort comorbidity network and furthermore partition the networks into communities of diseases. These are characterized by the over-expression of a few disease categories, and communities from different age or sex cohorts show various similarities in terms of these disease classes. Moreover, all the detected communities for all the cohorts can be organized into a hierarchical tree. This allows us to observe a number of clusters of communities, originating from diverse age and sex cohorts, that group together communities characterized by the same disease classes. We also perform a dismantling procedure of statistically validated comorbidity networks to highlight those categories of diseases that are most responsible for the compactedness of the comorbidity networks for a given cohort of patients.
This article investigates how well deep learning models can identify web registers -- text varieties such as news reports and discussion forums -- across 16 languages. We introduce the Multilingual CORE corpora, which contain 72,504 documents annotated with a hierarchical taxonomy of 25 registers designed to cover the entire open web. Our multilingual models achieve state-of-the-art results (79% F1 score) using multi-label classification. This performance matches or exceeds previous studies that used simpler classification schemes, showing that models can perform well even with a complex register scheme at a massively multilingual scale. However, we observe a consistent performance ceiling around 77-80% F1 score across all models and configurations. When we remove documents with uncertain labels through data pruning, performance increases to over 90% F1, suggesting that this ceiling stems from inherent ambiguity in web registers rather than model limitations. Analysis of hybrid documents -- texts combining multiple registers -- reveals that the main challenge is not in classifying hybrids themselves, but in distinguishing between hybrid and non-hybrid documents. Multilingual models consistently outperform monolingual ones, particularly helping languages with limited training data. While zero-shot performance drops by an average of 7% on unseen languages, this decrease varies substantially between languages (from 3% to 20%), indicating that while registers share many features across languages, they also maintain language-specific characteristics.
This work by Heinosaari, Miyadera, and Ziman offers a comprehensive overview of quantum incompatibility, defining and quantifying it within general operational theories and quantum mechanics. It positions incompatibility as a fundamental resource that unifies various impossibility statements and non-classical phenomena in quantum information science.
· +1
Pretrained language models are an integral part of AI applications, but their high computational cost for training limits accessibility. Initiatives such as Bloom and StarCoder aim to democratize access to pretrained models for collaborative community development. Despite these efforts, such models encounter challenges such as limited multilingual capabilities, risks of catastrophic forgetting during continual pretraining, and the high costs of training models from scratch, alongside the need to align with AI safety standards and regulatory frameworks. This paper presents Aurora-M, a 15B parameter multilingual open-source model trained on English, Finnish, Hindi, Japanese, Vietnamese, and code. Continually pretrained from StarCoderPlus on 435B additional tokens, Aurora-M surpasses 2T tokens in total training token count. It is the first open-source multilingual model fine-tuned on human-reviewed safety instructions, thus aligning its development not only with conventional red-teaming considerations, but also with the specific concerns articulated in the Biden-Harris Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. We evaluate Aurora-M across a wide range of tasks and languages, showcasing its robustness against catastrophic forgetting and its superior performance in multilingual settings, particularly in safety evaluations. We open-source Aurora-M and its variants to encourage responsible open-source development of large language models at this https URL
Machine-learned interatomic potentials (MLIPs) have shown significant promise in predicting infrared spectra with high fidelity. However, the absence of general-purpose MLIPs capable of handling a wide range of elements and their combinations has limited their broader applicability. In this work, we introduce MACE4IR, a machine learning foundation model built on the MACE architecture and trained on 10 million geometries and corresponding density-functional theory (DFT) energies, forces and dipole moments from the QCML dataset. The training data encompasses approximately 80 elements and a diverse set of molecules, including organic compounds, inorganic species, and metal complexes. MACE4IR accurately predicts energies, forces, dipole moments, and infrared spectra at significantly reduced computational cost compared to DFT. By combining generality, accuracy, and efficiency, MACE4IR opens the door to rapid and reliable infrared spectra prediction for complex systems across chemistry, biology, and materials science.
There are no more papers matching your filters at the moment.