A new 'dry lab' benchmark named SCIGYM was introduced to evaluate Large Language Models (LLMs) on their ability to perform iterative scientific discovery in systems biology, using formal biological models. Evaluations revealed that LLMs can learn from simulated experiments, but their performance degrades significantly with increasing system complexity and they struggle to infer regulatory modifier relationships.
10
There are no more papers matching your filters at the moment.