alphaXiv

History

Papers Benchmarks

National Institute for Japanese Language and Linguistics

08 May 2025

computer-science computation-and-language information-extraction

Rethinking the Relationship between the Power Law and Hierarchical Structures

the University of Tokyo

RIKEN National Institute for Japanese Language and Linguistics

Nakaishi et al. empirically re-evaluate the prevailing theory linking power-law correlation decay in language to underlying hierarchical structures, demonstrating that key assumptions of this theory do not hold for natural language syntax. Their analysis reveals that correlation decays by a power law in structures, sequential distance grows sublinearly, and Probabilistic Context-Free Grammars poorly model real syntactic hierarchies.

16 Jun 2025

chain-of-thought computer-science computation-and-language

Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs

Megagon Labs Recruit Co., Ltd.National Institute for Japanese Language and Linguistics

Recent advances in large language models (LLMs) have enabled impressive performance in various tasks. However, standard prompting often struggles to produce structurally valid and accurate outputs, especially in dependency parsing. We propose a novel step-by-step instruction strategy, where universal part-of-speech tagging precedes the prediction of syntactic heads and dependency labels, and a simplified CoNLL-U like output format, our method achieves state-of-the-art accuracy on Universal Dependencies datasets across 17 languages without hallucination or contamination. We further show that multilingual fine-tuning simultaneously improves cross-language generalization performance. Our results highlight the effectiveness of explicit reasoning steps in LLM-based parsing and offer a scalable, format-consistent alternative to bracket-based approaches.

17 Jan 2025

computer-science computation-and-language embedding-methods

Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

National Institute of Advanced Industrial Science and Technology Tokyo Metropolitan University The Institute of Statistical Mathematics National Institute for Japanese Language and Linguistics Hitotsubashi University

The meanings and relationships of words shift over time. This phenomenon is referred to as semantic shift. Research focused on understanding how semantic shifts occur over multiple time periods is essential for gaining a detailed understanding of semantic shifts. However, detecting change points only between adjacent time periods is insufficient for analyzing detailed semantic shifts, and using BERT-based methods to examine word sense proportions incurs a high computational cost. To address those issues, we propose a simple yet intuitive framework for how semantic shifts occur over multiple time periods by leveraging a similarity matrix between the embeddings of the same word through time. We compute a diachronic word similarity matrix using fast and lightweight word embeddings across arbitrary time periods, making it deeper to analyze continuous semantic shifts. Additionally, by clustering the similarity matrices for different words, we can categorize words that exhibit similar behavior of semantic shift in an unsupervised manner.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Rethinking the Relationship between the Power Law and Hierarchical Structures

Step-by-step Instructions and a Simple Tabular Output Format Improve the Dependency Parsing Accuracy of LLMs

Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices

Events

AI for Law

Personalize Your Feed