Astrophysikalisches Institut und Universitäts-SternwarteFriedrich-Schiller-Universität Jena
· +3
Here, we present the outcomes from the second Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry, which engaged participants across global hybrid locations, resulting in 34 team submissions. The submissions spanned seven key application areas and demonstrated the diverse utility of LLMs for applications in (1) molecular and material property prediction; (2) molecular and material design; (3) automation and novel interfaces; (4) scientific communication and education; (5) research data management and automation; (6) hypothesis generation and evaluation; and (7) knowledge extraction and reasoning from scientific literature. Each team submission is presented in a summary table with links to the code and as brief papers in the appendix. Beyond team results, we discuss the hackathon event and its hybrid format, which included physical hubs in Toronto, Montreal, San Francisco, Berlin, Lausanne, and Tokyo, alongside a global online hub to enable local and virtual collaboration. Overall, the event highlighted significant improvements in LLM capabilities since the previous year's hackathon, suggesting continued expansion of LLMs for applications in materials science and chemistry research. These outcomes demonstrate the dual utility of LLMs as both multipurpose models for diverse machine learning tasks and platforms for rapid prototyping custom applications in scientific research.
The German Commons provides the largest collection of verifiably openly licensed German text to date, comprising 154.56 billion tokens from 35.78 million documents, processed to ensure high quality and legal compliance for training German language models.
1
05 Oct 2025
High harmonic generation (HHG) is a crucial technology for compact, high-brightness extreme ultraviolet (XUV) and soft X-ray sources, which are key to advancing both fundamental and applied sciences. The availability of advanced driving lasers, with tunable wavelength, power, and pulse duration, opens new opportunities for optimizing HHG-based sources. While scaling laws for wavelength are well understood, this work focuses on how pulse duration impacts HHG efficiency and introduces a unified framework that links microscopic dynamics to macroscopic performance. We establish a practical scaling law for the single-atom dipole moment under phase-matching conditions, demonstrating a 1/t dependence at 515 nm wavelength. By connecting this microscopic scaling to macroscopic conversion efficiency, we provide clear guidelines for optimizing HHG output across different gases and driving wavelengths. Furthermore, we identify fundamental constraints, including the carrier-envelope-phase (CEP) walk-off, which limits efficiency at longer driver wavelengths and becomes especially significant for very short pulses. All predictions are based on simple, accessible formulas, eliminating the need for complex numerical simulations. Experiments confirm these predictions and highlight when short pulses are advantageous, particularly in scenarios where CEP walk-off and absorption effects are minimized. These findings offer practical principles for designing next-generation HHG sources, capable of Watt-level average power and extended spectral reach, enabling more versatile and powerful HHG-based XUV and soft X-ray sources.
Cross-encoders distilled from large language models (LLMs) are often more effective re-rankers than cross-encoders fine-tuned on manually labeled data. However, distilled models do not match the effectiveness of their teacher LLMs. We hypothesize that this effectiveness gap is due to the fact that previous work has not applied the best-suited methods for fine-tuning cross-encoders on manually labeled data (e.g., hard-negative sampling, deep sampling, and listwise loss functions). To close this gap, we create a new dataset, Rank-DistiLLM. Cross-encoders trained on Rank-DistiLLM achieve the effectiveness of LLMs while being up to 173 times faster and 24 times more memory efficient. Our code and data is available at this https URL
25
Plasmonic metasurfaces play a crucial role in resonance-driven photocatalytic reactions by effectively enhancing reactivity via localized surface plasmon resonances. Catalytic activity can be selectively modulated by tuning the strength of plasmonic resonances through two primary non-thermal mechanisms: near-field enhancement and hot carrier injection, which govern the population of energetic carrier excited or injected into unoccupied molecular orbitals. We developed a set of polarization-sensitive metasurfaces consisting of elliptical Au-TiO2 nanopillars, specifically designed to plasmonically modulate the reactivity of a model reaction: the photocatalytic degradation of methylene blue. Surface-enhanced Raman spectroscopy reveals a polarization-dependent reaction yield in real-time, modulating from 4.7 (transverse electric polarization) to 9.98 (transverse magnetic polarization) in 10 s period, as quantified by the integrated area of the 480 cm-1 Raman peak and correlated with enhanced absorption at 633 nm. The single metasurface configuration enables continuous tuning of photocatalytic reactivity via active control of plasmonic resonance strength, as evidenced by the positive correlation between measured absorption and product yield. This dynamic approach provides a route to selectively enhance or suppress resonance-driven reactions, which can be further leveraged to achieve selectivity in multibranch reactions, guiding product yields toward desired outcomes.
We uncover late-time gravitational-wave tails in fully nonlinear 3+1 dimensional numerical relativity simulations of merging black holes, using the highly accurate SpEC code. We achieve this result by exploiting the strong magnification of late-time tails due to binary eccentricity, recently observed in perturbative evolutions, and showcase here the tail presence in head-on configurations for several mass ratios close to unity. We validate the result through a large battery of numerical tests and detailed comparison with perturbative evolutions, which display striking agreement with full nonlinear ones. Our results offer yet another confirmation of the highly predictive power of black hole perturbation theory in the presence of a source, even when applied to nonlinear solutions. The late-time tail signal is much more prominent than anticipated until recently, and possibly within reach of gravitational-wave detectors measurements, unlocking observational investigations of an additional set of general relativistic predictions on the long-range gravitational dynamics.
Existing cross-encoder models can be categorized as pointwise, pairwise, or listwise. Pairwise and listwise models allow passage interactions, which typically makes them more effective than pointwise models but less efficient and less robust to input passage order permutations. To enable efficient permutation-invariant passage interactions during re-ranking, we propose a new cross-encoder architecture with inter-passage attention: the Set-Encoder. In experiments on TREC Deep Learning and TIREx, the Set-Encoder is as effective as state-of-the-art listwise models while being more efficient and invariant to input passage order permutations. Compared to pointwise models, the Set-Encoder is particularly more effective when considering inter-passage information, such as novelty, and retains its advantageous properties compared to other listwise models. Our code is publicly available at this https URL
Heavy quarks are powerful tools to characterize the quark-gluon plasma (QGP) produced in relativistic nuclear collisions. By exploiting a mapping between transport theory and hydrodynamics, we developed a fluid-dynamic description of heavy-quark diffusion in the QCD plasma. We present results for the transverse momentum distributions of charm hadrons and evolution of charm density and diffusion fields obtained using a fluid-dynamic code coupled with the conservation of a heavy-quark current in the QGP in various collision systems.
The Photon-Ion Spectrometer at PETRA III (PIPE) has provided high-precision experimental data on photon interactions with ionized quantum systems during its initial five years of operation. Utilizing a merged-beams technique with record-high photon flux and an extended energy range, PIPE delivered critical astrophysical data, established new soft X-ray photon-energy calibration standards, and advanced the understanding of complex multi-electron processes in atomic and molecular ions, including endohedral fullerenes.
Photon pairs generated from spontaneous parametric down-conversion are a well-established method to realize entangled bipartite photonic systems. Laguerre-Gaussian modes, which carry orbital angular momentum (OAM), are commonly exploited to engineer high-dimensional entangled quantum states. %experimentally. For Hilbert spaces with dimension d>2, maximally entangled states (MESs) help to improve the capacity and security of quantum communication protocols, among several other promising features. However, the direct generation of MES in well-defined high-dimensional subspaces of the infinite OAM basis has remained a challenge. Here, we formalize how the spatial distribution of the pump beam and the nonlinear profile of the crystal can be simultaneously utilized to generate MES without additional spatial filtering of OAM modes within a subspace. We illustrate our approach with maximally entangled qutrits (d=3) and ququints (d=5).
We use a novel real-time formulation of the functional renormalization group (FRG) for dynamical systems with reversible mode couplings to study Model G and H, which are the conjectured dynamic universality classes of the two-flavor chiral phase transition and the QCD critical point, respectively. We compute the dynamic critical exponent in both models in spatial dimensions $2
Empathy is a cognitive and emotional reaction to an observed situation of others. Empathy has recently attracted interest because it has numerous applications in psychology and AI, but it is unclear how different forms of empathy (e.g., self-report vs counterpart other-report, concern vs. distress) interact with other affective phenomena or demographics like gender and age. To better understand this, we created the {\it Empathic Conversations} dataset of annotated negative, empathy-eliciting dialogues in which pairs of participants converse about news articles. People differ in their perception of the empathy of others. These differences are associated with certain characteristics such as personality and demographics. Hence, we collected detailed characterization of the participants' traits, their self-reported empathetic response to news articles, their conversational partner other-report, and turn-by-turn third-party assessments of the level of self-disclosure, emotion, and empathy expressed. This dataset is the first to present empathy in multiple forms along with personal distress, emotion, personality characteristics, and person-level demographic information. We present baseline models for predicting some of these features from conversations.
How will generative AI pay for itself? Unless charging users for access, selling advertising is the only alternative. Especially in the multi-billion dollar web search market with ads as the main source of revenue, the introduction of a subscription model seems unlikely. The recent disruption of search by generative large language models could thus ultimately be accompanied by generated ads. Our concern is that the commercialization of generative AI in general and large language models in particular could lead to native advertising in the form of quite subtle brand or product placements. In web search, the evolution of search engine results pages (SERPs) from traditional lists of ``ten blue links'' (lists SERPs) to generated text with web page references (text SERPs) may further blur the line between advertising-based and organic search results, making it difficult for users to distinguish between the two, depending on how advertising is integrated and disclosed. To raise awareness of this potential development, we conduct a pilot study analyzing the capabilities of current large language models to blend ads with organic search results. Although the models still struggle to subtly frame ads in an unrelated context, their potential is evident when integrating ads into related topics which calls for further investigation.
Large-scale structure formation is studied in a kinetic theory approach, extending the standard perfect pressureless fluid description for dark matter by including the velocity dispersion tensor as a dynamical degree of freedom. The evolution of power spectra for density, velocity and velocity dispersion degrees of freedom is investigated in a non-perturbative approximation scheme based on the Dyson\unicodex2013\unicode{x2013}Schwinger equation. In particular, the generation of vorticity and velocity dispersion is studied and predictions for the corresponding power spectra are made, which qualitatively agree well with results obtained from NN-body simulations. It is found that velocity dispersion grows strongly due to non-linear effects and at late times its mean value seems to be largely independent of the initial conditions. By taking this into account, a rather realistic picture of non-linear large-scale structure formation can be obtained, albeit the numerical treatment remains challenging, especially for very cold dark matter models.
A study from the Leibniz-Institut für Astrophysik Potsdam reconstructs the Milky Way's spatially resolved star formation history and disc growth by combining an orbit superposition method with APOGEE data and stellar birth radii estimation, revealing inside-out disc growth, a significant secondary star formation peak 4 Gyr ago that built the outer disc, and suggesting the -bimodality arises from spatially varying star formation.
In this study, we use simple performance metrics to assess the science capabilities of future ground-based gravitational-wave detector networks -- composed of A+ or Voyager upgrades to the LIGO, Virgo, and KAGRA observatories and proposed next generation observatories such as Cosmic Explorer and Einstein Telescope. These metrics refer to coalescences of binary neutron stars (BNSs) and binary black holes (BBHs) and include: (i) network detection efficiency and detection rate of cosmological sources as a function of redshift, (ii) signal-to-noise ratios and the accuracy with which intrinsic and extrinsic parameters would be measured, and (iii) enabling multimessenger astronomy with gravitational waves by accurate 3D localization and early warning alerts. We further discuss the science enabled by the small population of rare and extremely loud events. While imminent upgrades will provide impressive advances in all these metrics, next generation observatories will deliver an improvement of an order-of-magnitude or more in most metrics. In fact, a network containing two or three such facilities will detect half of all the BNS and BBH mergers up to a redshift of z=1z=1 and z=20z=20, respectively, give access to hundreds of BNSs and ten thousand BBHs with signal-to-noise ratios exceeding 100, readily localize hundreds to thousands of mergers to within 1deg21\,{\rm deg^2} on the sky and better than 10% in luminosity distance, respectively, and consequently, enable mutlimessenger astronomy through follow-up surveys in the electromagnetic spectrum several times a week. Such networks will further shed light on potential cosmological merger populations and detect an abundance of high-fidelity BNS and BBH signals which will allow investigations of the high-density regime of matter at an unprecedented level and enable precision tests of general relativity in the strong-field regime, respectively.
Artificial intelligence (AI) and machine learning (ML) are expanding in popularity for broad applications to challenging tasks in chemistry and materials science. Examples include the prediction of properties, the discovery of new reaction pathways, or the design of new molecules. The machine needs to read and write fluently in a chemical language for each of these tasks. Strings are a common tool to represent molecular graphs, and the most popular molecular string representation, SMILES, has powered cheminformatics since the late 1980s. However, in the context of AI and ML in chemistry, SMILES has several shortcomings -- most pertinently, most combinations of symbols lead to invalid results with no valid chemical interpretation. To overcome this issue, a new language for molecules was introduced in 2020 that guarantees 100\% robustness: SELFIES (SELF-referencIng Embedded Strings). SELFIES has since simplified and enabled numerous new applications in chemistry. In this manuscript, we look to the future and discuss molecular string representations, along with their respective opportunities and challenges. We propose 16 concrete Future Projects for robust molecular representations. These involve the extension toward new chemical domains, exciting questions at the interface of AI and robust languages and interpretability for both humans and machines. We hope that these proposals will inspire several follow-up works exploiting the full potential of molecular string representations for the future of AI in chemistry and materials science.
Algorithms for learning decision trees often include heuristic local-search operations such as (1) adjusting the threshold of a cut or (2) also exchanging the feature of that cut. We study minimizing the number of classification errors by performing a fixed number of a single type of these operations. Although we discover that the corresponding problems are NP-complete in general, we provide a comprehensive parameterized-complexity analysis with the aim of determining those properties of the problems that explain the hardness and those that make the problems tractable. For instance, we show that the problems remain hard for a small number dd of features or small domain size DD but the combination of both yields fixed-parameter tractability. That is, the problems are solvable in (D+1)2dIO(1)(D + 1)^{2d} \cdot |I|^{O(1)} time, where I|I| is the size of the input. We also provide a proof-of-concept implementation of this algorithm and report on empirical results.
Transferring quantum states efficiently between distant nodes of an information processing circuit is of paramount importance for scalable quantum computing. We report on the first observation of a perfect state transfer protocol on a lattice, thereby demonstrating the general concept of trans- porting arbitrary quantum information with high fidelity. Coherent transfer over 19 sites is realized by utilizing judiciously designed optical structures consisting of evanescently coupled waveguide ele- ments. We provide unequivocal evidence that such an approach is applicable in the quantum regime, for both bosons and fermions, as well as in the classical limit. Our results illustrate the potential of the perfect state transfer protocol as a promising route towards integrated quantum computing on a chip.
This collection of perspective pieces captures recent advancements and reflections from a dynamic research community dedicated to bridging quantum gravity, hydrodynamics, and emergent cosmology. It explores four key research areas: (a) the interplay between hydrodynamics and cosmology, including analog gravity systems; (b) phase transitions, continuum limits and emergent geometry in quantum gravity; (c) relational perspectives in gravity and quantum gravity; and (d) the emergence of cosmological models rooted in quantum gravity frameworks. Each contribution presents the distinct perspectives of its respective authors. Additionally, the introduction by the editors proposes an integrative view, suggesting how these thematic units could serve as foundational pillars for a novel theoretical cosmology framework termed "hydrodynamics on superspace".
There are no more papers matching your filters at the moment.