alphaXiv

History

Papers Benchmarks

John Hopkins

23 Oct 2025

computer-science disordered-systems-and-neural-networks computation-and-language

On the Emergence of Linear Analogies in Word Embeddings

Google DeepMind

UC Berkeley Ecole Polytechnique Fédérale de Lausanne John Hopkins

Models such as Word2Vec and GloVe construct word embeddings based on the co-occurrence probability

P(i,j)

of words

i

and

j

in text corpora. The resulting vectors

W_i

not only group semantically similar words but also exhibit a striking linear analogy structure -- for example,

W_{\text{king}} - W_{\text{man}} + W_{\text{woman}} \approx W_{\text{queen}}

-- whose theoretical origin remains unclear. Previous observations indicate that this analogy structure: (i) already emerges in the top eigenvectors of the matrix

M(i,j) = P(i,j)/P(i)P(j)

, (ii) strengthens and then saturates as more eigenvectors of

M (i, j)

, which controls the dimension of the embeddings, are included, (iii) is enhanced when using

\log M(i,j)

rather than

M(i,j)

, and (iv) persists even when all word pairs involved in a specific analogy relation (e.g., king-queen, man-woman) are removed from the corpus. To explain these phenomena, we introduce a theoretical generative model in which words are defined by binary semantic attributes, and co-occurrence probabilities are derived from attribute-based interactions. This model analytically reproduces the emergence of linear analogy structure and naturally accounts for properties (i)-(iv). It can be viewed as giving fine-grained resolution into the role of each additional embedding dimension. It is robust to various forms of noise and agrees well with co-occurrence statistics measured on Wikipedia and the analogy benchmark introduced by Mikolov et al.

13 Sep 2024

computer-science computer-vision-security computer-vision-and-pattern-recognition

The State of Computer Vision Research in Africa

University College London

University of British Columbia

University of California, Davis Dublin City University Ontario Tech University RIKEN Center for Advanced Intelligence Project University of Minnesota Twin Cities Al-Azhar University German University in Cairo Nile University Ashesi University University of Sfax African Masters of Machine Intelligence/AIMS John Hopkins New Mexico’s State University Queens ’ University

Despite significant efforts to democratize artificial intelligence (AI), computer vision which is a sub-field of AI, still lags in Africa. A significant factor to this, is the limited access to computing resources, datasets, and collaborations. As a result, Africa's contribution to top-tier publications in this field has only been 0.06% over the past decade. Towards improving the computer vision field and making it more accessible and inclusive, this study analyzes 63,000 Scopus-indexed computer vision publications from Africa. We utilize large language models to automatically parse their abstracts, to identify and categorize topics and datasets. This resulted in listing more than 100 African datasets. Our objective is to provide a comprehensive taxonomy of dataset categories to facilitate better understanding and utilization of these resources. We also analyze collaboration trends of researchers within and outside the continent. Additionally, we conduct a large-scale questionnaire among African computer vision researchers to identify the structural barriers they believe require urgent attention. In conclusion, our study offers a comprehensive overview of the current state of computer vision research in Africa, to empower marginalized communities to participate in the design and development of computer vision systems.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

On the Emergence of Linear Analogies in Word Embeddings

The State of Computer Vision Research in Africa

Events

AI for Law

Personalize Your Feed