digital-libraries
The Sustainable Development Goals (SDGs) offer a lens for tracking societal change, yet contributions from the social and behavioral sciences have rarely been integrated into policy agendas. To take stock and create a baseline and benchmark for the future, we assemble 233,061 psychology publications (1894 -- 2022) and tag them to the 17 SDGs using a query-based classifier. Health, education, work, inequality, and gender dominate the study of SDGs in psychology, shifting from an early focus on work to education and inequality, and since the 1960s, health. United States-based research leads across most goals. Other countries set distinct priorities (e.g., China: education and work; Australia: health). Women comprise about one-third of authors, concentrated in social and health goals, but have been underrepresented in STEM-oriented goals. The 2015 launch of the SDGs marked a turning point: SDG-tagged publications have been receiving more citations than comparable non-SDG work, reversing a pre-2015 deficit. Tracking the SDGs through psychology clarifies long-run engagement with social priorities, identifies evidence gaps, and guides priorities to accelerate the field's contribution to the SDG agenda.
As Open Access continues to gain importance in science policy, understanding the proportion of Open Access publications relative to the total research output of research-performing organizations, individual countries, or even globally has become increasingly relevant. In response, dashboards are being developed to capture and communicate progress in this area. To provide an overview of these dashboards and their characteristics, an extensive survey was conducted, resulting in the identification of nearly 60 dashboards. To support a detailed and structured description, a dedicated metadata schema was developed, and the identified dashboards were systematically indexed accordingly. To foster community engagement and ensure ongoing development, a participatory process was launched, allowing interested stakeholders to contribute to the dataset. The dataset is particularly relevant for researchers in Library and Information Science (LIS) and Science and Technology Studies (STS), supporting both empirical analyses of Open Access and the methodological refinement of indicators and policy instruments in the context of Open Science.
The SciRAG framework integrates adaptive retrieval, citation-aware symbolic reasoning, and outline-guided synthesis to enhance scientific knowledge aggregation from vast literature. It achieved top-tier factual accuracy, ranking first in correctness on 4 out of 5 evaluated scientific QA benchmarks, while maintaining a low hallucination rate of approximately 6% for uncited sentences.
1
The domination of scientific publishing in the Global North by major commercial publishers is harmful to science. We need the most powerful members of the research community, funders, governments and Universities, to lead the drive to re-communalise publishing to serve science not the market.
LeMat-Synth introduces an open-source, multi-modal toolbox that transforms unstructured scientific literature into machine-readable data for materials science, extracting structured synthesis procedures and performance data from text and figures. This effort culminated in LeMat-Synth (v1.0), a dataset curated from 81,000 papers, demonstrating high extraction quality with a human-LLM evaluation correlation of 0.72 and figure data digitization accuracy achieving a relative RMSE of 0.09.
Researchers from the University of Pennsylvania and collaborators empirically demonstrate that authors' self-rankings of their multiple submissions to a major AI conference are powerful predictors of future scientific impact. Papers ranked highest by their authors accumulated approximately twice as many citations as their lowest-ranked counterparts, a predictive capability that often surpassed that of traditional peer review scores.
In a context where the social sciences and humanities are experimenting with non-anthropocentric analytical frames, this article proposes a semiotic (structural) reading of the hybridization between symbolic AI and neural (or sub-symbolic) AI based on a field of application: the design and use of a knowledge base for area studies. We describe the LaCAS ecosystem -- Open Archives in Linguistic and Cultural Studies (thesaurus; RDF/OWL ontology; LOD services; harvesting; expertise; publication), deployed at Inalco (National Institute for Oriental Languages and Civilizations) in Paris with the Okapi (Open Knowledge and Annotation Interface) software environment from Ina (National Audiovisual Institute), which now has around 160,000 documentary resources and ten knowledge macro-domains grouping together several thousand knowledge objects. We illustrate this approach using the knowledge domain ''Languages of the world'' (~540 languages) and the knowledge object ''Quechua (language)''. On this basis, we discuss the controlled integration of neural tools, more specifically generative tools, into the life cycle of a knowledge base: assistance with data localization/qualification, index extraction and aggregation, property suggestion and testing, dynamic file generation, and engineering of contextualized prompts (generic, contextual, explanatory, adjustment, procedural) aligned with a domain ontology. We outline an ecosystem of specialized agents capable of animating the database while respecting its symbolic constraints, by articulating model-driven and data-driven methods.
Despite the growing availability of tools designed to support scholarly knowledge extraction and organization, many researchers still rely on manual methods, sometimes due to unfamiliarity with existing technologies or limited access to domain-adapted solutions. Meanwhile, the rapid increase in scholarly publications across disciplines has made it increasingly difficult to stay current, further underscoring the need for scalable, AI-enabled approaches to structuring and synthesizing scholarly knowledge. Various research communities have begun addressing this challenge independently, developing tools and frameworks aimed at building reliable, dynamic, and queryable scholarly knowledge bases. However, limited interaction across these communities has hindered the exchange of methods, models, and best practices, slowing progress toward more integrated solutions. This manuscript identifies ways to foster cross-disciplinary dialogue, identify shared challenges, categorize new collaboration and shape future research directions in scholarly knowledge and organization.
A comprehensive survey categorizes the evolving roles of Large Language Models in scientific innovation using a novel pyramidal framework, classifying them as Evaluators, Collaborators, or Scientists. The work details LLM capabilities in scientific knowledge synthesis, hypothesis generation, and autonomous discovery, analyzing associated benchmarks, algorithms, and ethical considerations across these roles.
10
Researchers at Nanjing University of Science and Technology developed a Knowledge Interaction Model that integrates human expert reviews with LLM-generated method summaries to predict academic paper method novelty. This collaborative approach achieved an F1-score of 0.83, outperforming methods relying on individual knowledge sources and zero-shot LLM baselines, demonstrating the value of fusing human judgment with AI's processing capabilities.
The KnoVo framework quantifies multi-dimensional research novelty and tracks knowledge evolution by using open-source LLMs to dynamically extract specific contribution dimensions from papers and perform fine-grained comparisons. It generates overall and temporal novelty scores, and reconstructs evolutionary pathways of ideas within specific dimensions, offering a content-aware, automated approach to scientific literature analysis.
Researchers analyzed 16,193 LLM-related papers from 77 top-tier computer science conferences (2019-2024) to map the evolving research landscape, revealing LLM dominance in AI fields, significant growth in systems and interdisciplinary areas, shifts in institutional contributions, and distinct national specializations.
Purpose This study aims to evaluate the accuracy of authorship attributions in scientific publications, focusing on the fairness and precision of individual contributions within academic works. Design/methodology/approach The study analyzes 81,823 publications from the journal PLOS ONE , covering the period from January 2018 to June 2023. It examines the authorship attributions within these publications to try and determine the prevalence of inappropriate authorship. It also investigates the demographic and professional profiles of affected authors, exploring trends and potential factors contributing to inaccuracies in authorship. Findings Surprisingly, 9.14% of articles feature at least one author with inappropriate authorship, affecting over 14,000 individuals (2.56% of the sample). Inappropriate authorship is more concentrated in Asia, Africa, and specific European countries like Italy. Established researchers with significant publication records and those affiliated with companies or nonprofits show higher instances of potential monetary authorship. Research limitations Our findings are based on contributions as declared by the authors, which implies a degree of trust in their transparency. However, this reliance on self-reporting may introduce biases or inaccuracies into the dataset. Further research could employ additional verification methods to enhance the reliability of the findings. Practical implications These findings have significant implications for journal publishers, highlighting the necessity for robust control mechanisms to ensure the integrity of authorship attributions. Moreover, researchers must exercise discernment in determining when to acknowledge a contributor and when to include them in the author list. Addressing these issues is crucial for maintaining the credibility and fairness of academic publications. Originality/value This study contributes to an understanding of critical issues within academic authorship, shedding light on the prevalence and impact of inappropriate authorship attributions. By calling for a nuanced approach to ensure accurate credit is given where it is due, the study underscores the importance of upholding ethical standards in scholarly publishing.
SCI-IDEA introduces an iterative, human-in-the-loop framework that leverages large language models and semantic embeddings to generate context-aware and innovative scientific ideas. The system identifies potential "Aha Moments" and refines research concepts through a multi-dimensional evaluation across novelty, excitement, feasibility, and effectiveness.
A breakthrough multi-agent framework from Peking University researchers automates biomedical knowledge graph construction using nine specialized LLM-based agents, successfully identifying over 38,000 new entities while achieving 83.1% verification accuracy through novel domain-adaptive prompting and multi-layer verification strategies.
This study comprehensively analyzes how academic tenure influences faculty research across diverse U.S. disciplines, integrating seven large datasets to uncover distinct shifts in productivity, impact, and novelty. It finds that tenure acts as a critical career inflection point, driving intense pre-tenure output and impact, followed by discipline-specific productivity patterns and a general shift towards more novel, exploratory research post-tenure.
This research introduces "Chronicling Germany," a large, manually annotated dataset of 801 historical German newspaper pages (1617-1933) with detailed layout and text line annotations, created through a collaboration between Universität Bonn's HPCA-Lab and Institute of Historical Studies. It establishes a three-stage deep learning pipeline for layout segmentation, baseline detection, and OCR, achieving a 97.8% character accuracy for OCR-only and 93% end-to-end accuracy on complex Fraktur texts, demonstrating robust performance on in-distribution and out-of-distribution data.
Reinforcement Learning (RL) is a continuously growing field that has the potential to revolutionize many areas of artificial intelligence. However, despite its promise, RL research is often hindered by the lack of standardization in environment and algorithm implementations. This makes it difficult for researchers to compare and build upon each other's work, slowing down progress in the field. Gymnasium is an open-source library that provides a standard API for RL environments, aiming to tackle this issue. Gymnasium's main feature is a set of abstractions that allow for wide interoperability between environments and training algorithms, making it easier for researchers to develop and test RL algorithms. In addition, Gymnasium provides a collection of easy-to-use environments, tools for easily customizing environments, and tools to ensure the reproducibility and robustness of RL research. Through this unified framework, Gymnasium significantly streamlines the process of developing and testing RL algorithms, enabling researchers to focus more on innovation and less on implementation details. By providing a standardized platform for RL research, Gymnasium helps to drive forward the field of reinforcement learning and unlock its full potential. Gymnasium is available online at this https URL
The landscape of science and technology is characterized by its dynamic and evolving nature, constantly reshaped by new discoveries, innovations, and paradigm shifts. Moreover, science is undergoing a remarkable shift towards increasing interdisciplinary collaboration, where the convergence of diverse fields fosters innovative solutions to complex problems. Detecting emerging scientific topics is paramount as it enables industries, policymakers, and innovators to adapt their strategies, investments, and regulations proactively. As the common approach for detecting emerging technologies, despite being useful, bibliometric analyses may suffer from oversimplification and/or misinterpretation of complex interdisciplinary trends. In addition, relying solely on domain experts to pinpoint emerging technologies from science and technology trends might restrict the ability to systematically analyze extensive information and introduce subjective judgments into the interpretations. To overcome these drawbacks, in this work, we present an automated artificial intelligence-enabled framework, called WISDOM, for detecting emerging research themes using advanced topic modeling and weak signal analysis. The proposed approach can assist strategic planners and domain experts in more effectively recognizing and tracking trends related to emerging topics by swiftly processing and analyzing vast volumes of data, uncovering hidden cross-disciplinary patterns, and offering unbiased insights, thereby enhancing the efficiency and objectivity of the detection process. As the case technology, we assess WISDOM's performance in identifying emerging research as well as their trends, in the field of underwater sensing technologies using scientific papers published between 2004 and 2021.
This paper introduces LLAssist, an open-source tool designed to streamline literature reviews in academic research. In an era of exponential growth in scientific publications, researchers face mounting challenges in efficiently processing vast volumes of literature. LLAssist addresses this issue by leveraging Large Language Models (LLMs) and Natural Language Processing (NLP) techniques to automate key aspects of the review process. Specifically, it extracts important information from research articles and evaluates their relevance to user-defined research questions. The goal of LLAssist is to significantly reduce the time and effort required for comprehensive literature reviews, allowing researchers to focus more on analyzing and synthesizing information rather than on initial screening tasks. By automating parts of the literature review workflow, LLAssist aims to help researchers manage the growing volume of academic publications more efficiently.
There are no more papers matching your filters at the moment.