OpenLocus
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Researchers from AI2, OpenLocus, and UMass Amherst introduce DISCOVERYBENCH, a new benchmark designed to evaluate large language models' ability to perform multi-step data-driven scientific discovery. The benchmark, comprising 264 real-world tasks and 903 synthetic tasks, reveals that current state-of-the-art LLMs achieve a maximum Hypothesis Matching Score of 25%, indicating significant limitations in autonomous discovery.

View blog
Resources113
Data-driven Discovery with Large Generative Models

This position paper explores the use of Large Generative Models (LGMs) for end-to-end data-driven scientific discovery, proposing a hybrid system that combines LGM capabilities with robust external tools and active human feedback. Their proof-of-concept, DATAVOYAGER, demonstrated the potential for automated hypothesis generation and verification from existing datasets while also highlighting the necessity of human oversight and tool integration to mitigate LGM limitations.

View blog
Resources
There are no more papers matching your filters at the moment.