The AGI Company
This research introduces 'interaction scaling,' enabling AI agents to improve performance in interactive environments by taking more actions and engaging in longer sequences of interaction. A novel training methodology, `TTI` (Test-Time Interaction), is presented which utilizes a curriculum learning approach to train agents to effectively leverage these extended interaction horizons, achieving state-of-the-art results for open-source web agents on WebVoyager and WebArena benchmarks using a Gemma 3 12B model.
62
Researchers developed REAL, a benchmark featuring 11 deterministic, high-fidelity simulations of real-world websites and 112 multi-turn tasks to evaluate autonomous web agents. The evaluation of frontier language models on REAL revealed that no agent achieved higher than a 41.07% success rate, indicating significant limitations in current capabilities.
387
There are no more papers matching your filters at the moment.