alphaXiv

History

Papers Benchmarks

Axiom

14 Jul 2025

agents ai-for-health computer-science

Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab

University of Toronto

Université de Montréal Vector Institute Axiom SickKids

A new 'dry lab' benchmark named SCIGYM was introduced to evaluate Large Language Models (LLMs) on their ability to perform iterative scientific discovery in systems biology, using formal biological models. Evaluations revealed that LLMs can learn from simulated experiments, but their performance degrades significantly with increasing system complexity and they struggle to infer regulatory modifier relationships.

There are no more papers matching your filters at the moment.

Events

AI for Law
Joel Niklaus· Hugging Face
01/09
Register
Watch recordings

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab

Events

AI for Law

Personalize Your Feed