alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

We're hiring
PaperBlogResources

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

BibTex
Copy
@misc{jumeletSat Oct 11 2025 10:50:47 GMT+0000 (Coordinated Universal Time)babybabellmmultilingualbenchmark,
      title={BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data},
      author={Jaap Jumelet and Abdellah Fourtassi and Akari Haga and Bastian Bunzeck and Bhargav Shandilya and Diana Galvan-Sosa and Faiz Ghifari Haznitrama and Francesca Padovani and Francois Meyer and Hai Hu and Julen Etxaniz and Laurent Prévot and Linyang He and María Grandury and Mila Marcheva and Negar Foroutan and Nikitas Theodoropoulos and Pouya Sadeghi and Siyuan Song and Suchir Salhan and Susana Zhou and Yurii Paniv and Ziyin Zhang and Arianna Bisazza and Alex Warstadt and Leshem Choshen},
      year={Sat Oct 11 2025 10:50:47 GMT+0000 (Coordinated Universal Time)},
      eprint={2510.10159},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.10159},
}
GitHub
multilingual-evaluation
0
HTTPS
https://github.com/babylm-org/multilingual-evaluation
SSH
git@github.com:babylm-org/multilingual-evaluation.git
CLI
gh repo clone babylm-org/multilingual-evaluation
Transform this paper into an audio lecture
Get an engaging lecture and Q&A format to quickly understand the paper in minutes, perfect for learning on the go.
Audio lecture
Q&A format