alphaXiv

Technological University of Peru

24 Jun 2024

computer-science computer-vision-and-pattern-recognition

Enhancing Scientific Figure Captioning Through Cross-modal Learning

Scientific charts are essential tools for effectively communicating research findings, serving as a vital medium for conveying information and revealing data patterns. With the rapid advancement of science and technology, coupled with the advent of the big data era, the volume and diversity of scientific research data have surged, leading to an increase in the number and variety of charts. This trend presents new challenges for researchers, particularly in efficiently and accurately generating appropriate titles for these charts to better convey their information and results. Automatically generated chart titles can enhance information retrieval systems by providing precise data for detailed chart classification. As research in image captioning and text summarization matures, the automatic generation of scientific chart titles has gained significant attention. By leveraging natural language processing, machine learning, and multimodal techniques, it is possible to automatically extract key information from charts and generate accurate, concise titles that better serve the needs of researchers. This paper presents a novel approach to scientific chart title generation, demonstrating its effectiveness in improving the clarity and accessibility of research data.

11 Mar 2025

computer-science computation-and-language

Interpretable and Robust Dialogue State Tracking via Natural Language Summarization with LLMs

Technological University of Peru

Researchers at the Technological University of Peru developed Natural Language Dialogue State Tracking (NL-DST), a framework that trains Large Language Models to generate human-readable summaries of dialogue states instead of traditional slot-value pairs. This approach achieved a Joint Goal Accuracy of 65.9% on MultiWOZ 2.1, outperforming baselines by 7.8%, and demonstrated enhanced robustness to noise.

09 Aug 2025

attention-mechanisms computer-science computer-vision-and-pattern-recognition

MV-CoRe: Multimodal Visual-Conceptual Reasoning for Complex Visual Question Answering

Technological University of Peru Shaanxi University of Technology

MV-CoRe integrates diverse visual and linguistic features, including global context, object details, and scene graphs, through a multimodal fusion transformer to enhance complex visual question answering. It achieves improved accuracy on challenging datasets like GQA, A-OKVQA, and OKVQA by facilitating deeper visual-conceptual reasoning.

12 Dec 2024

computer-science computation-and-language few-shot-learning

Align, Generate, Learn: A Novel Closed-Loop Framework for Cross-Lingual In-Context Learning

Technological University of Peru

Cross-lingual in-context learning (XICL) has emerged as a transformative paradigm for leveraging large language models (LLMs) to tackle multilingual tasks, especially for low-resource languages. However, existing approaches often rely on external retrievers or task-specific fine-tuning, limiting their scalability and generalizability. In this paper, we propose a novel self-supervised framework that harnesses the generative capabilities of LLMs to internally select and utilize task-relevant examples. Our method introduces two key objectives: a retrieval-generation alignment loss to optimize the quality of selected examples and a semantic coherence loss to ensure cross-lingual consistency. Through extensive experiments on multilingual benchmarks, our approach achieves state-of-the-art performance, significantly outperforming existing baselines. Further analysis highlights its robustness across diverse language families and its ability to generalize to unseen tasks. Human evaluations confirm the superior fluency, relevance, and semantic correctness of outputs generated by our method. This work provides a scalable, effective, and generalizable solution for cross-lingual in-context learning.

There are no more papers matching your filters at the moment.