This research introduces TOOLMAKER, an agentic framework developed by a diverse team including KatherLab, that autonomously transforms existing scientific code repositories into LLM-compatible tools, enabling agents to dynamically expand their capabilities beyond human-predefined functions. It achieves an 80% success rate on complex scientific tasks on the new TM-BENCH benchmark, significantly outperforming other state-of-the-art software engineering agents.
View blogResearchers developed MedAlpaca, an open-source suite of large language models specifically fine-tuned for medical conversations, alongside the Medical Meadow dataset of over 160,000 medical tasks. These models demonstrate improved medical competency, with the 13-billion parameter variant scoring 0.602 accuracy on USMLE Step 3, offering a privacy-conscious approach for AI in healthcare.
View blogA study by the Kather Group at Technical University Dresden demonstrated that multimodal large language models like GPT-4V can classify cancer pathology images using in-context learning. This approach achieved up to 90% accuracy on binary tasks, often matching or exceeding specialized models trained with identical, small datasets, and provided transparent text-based reasoning.
View blog