Ask or search anything...

History

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Hot

Mass General Brigham

Multimodal Whole Slide Foundation Model for Pathology

29 Nov 2024

Harvard University the University of Tokyo logo

the University of Tokyo

This paper introduces TITAN, a multimodal foundation model that effectively processes whole slide pathology images and text through a three-stage pretraining approach

View blog

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Resources 162

520

HEST-1k: A Dataset for Spatial Transcriptomics and Histology Image Analysis

02 Nov 2024

ETH Zurich Pusan National University

HEST-1k introduces a large, meticulously curated dataset of paired spatial transcriptomics and histology data, alongside a supporting library and benchmark, enabling the evaluation of deep learning models for predicting gene expression from tissue morphology. This resource helps advance the capabilities of foundation models in pathology and facilitates the exploration of morphomolecular relationships.

View blog

#ai-for-health #computer-science #computer-vision-security

Resources 319

388

Vision Foundation Models for Computed Tomography

26 Feb 2025

Technical University of Munich Maastricht University

Foundation models (FMs) have shown transformative potential in radiology by performing diverse, complex tasks across imaging modalities. Here, we developed CT-FM, a large-scale 3D image-based pre-trained model designed explicitly for various radiological tasks. CT-FM was pre-trained using 148,000 computed tomography (CT) scans from the Imaging Data Commons through label-agnostic contrastive learning. We evaluated CT-FM across four categories of tasks, namely, whole-body and tumor segmentation, head CT triage, medical image retrieval, and semantic understanding, showing superior performance against state-of-the-art models. Beyond quantitative success, CT-FM demonstrated the ability to cluster regions anatomically and identify similar anatomical and structural concepts across scans. Furthermore, it remained robust across test-retest settings and indicated reasonable salient regions attached to its embeddings. This study demonstrates the value of large-scale medical imaging foundation models and by open-sourcing the model weights, code, and data, aims to support more adaptable, reliable, and interpretable AI solutions in radiology.

View blog

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Resources 8

260

AI-driven 3D Spatial Transcriptomics

25 Feb 2025

University of Washington University of Cambridge logo

University of Cambridge

Researchers developed VORTEX, an artificial intelligence framework that predicts 3D spatial gene expression patterns across entire tissue volumes using 3D morphological imaging data and limited 2D spatial transcriptomics. This method successfully mapped complex expression landscapes and tumor microenvironments in various cancer types, overcoming the limitations of traditional 2D spatial transcriptomics.

View blog

#computer-science #computer-vision-and-pattern-recognition #applications

Resources

970

MedBrowseComp: Benchmarking Medical Deep Research and Computer Use

20 May 2025

Samuel Schmidgall

Shan Chen

Universitat Pompeu Fabra Johns Hopkins University logo

Johns Hopkins University

Researchers from Harvard, Mass General Brigham, MIT, and collaborating institutions introduce MedBrowseComp, a benchmark that evaluates AI agents' ability to navigate live medical knowledge bases and synthesize multi-hop evidence from sources like HemOnc.org, PubMed, and ClinicalTrials.gov, revealing that frontier agentic systems achieve as low as 10% accuracy on complex medical information retrieval tasks with performance degrading monotonically as navigation depth increases, while Computer Use Agents show superior performance when starting from domain-specific pages rather than general search engines.

View blog

#agents #ai-for-health #chain-of-thought

Resources 34

287

Generative Distribution Embeddings

23 May 2025

Harvard University Mass General Brigham

Harvard University and affiliated researchers develop Generative Distribution Embeddings (GDEs), a framework that combines distribution-invariant encoders with conditional generative models to learn representations of probability distributions, demonstrating through systematic benchmarking that GDEs outperform Kernel Mean Embeddings and Wasserstein Wormhole on synthetic datasets while achieving successful applications across six computational biology problems including cell population modeling, perturbation effect prediction, DNA methylation pattern analysis, synthetic promoter design, and viral protein sequence modeling.

View blog

#ai-for-genomics #computer-science #machine-learning

Resources 41

1,058

Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology

19 May 2024

Harvard University Mass General Brigham

PANTHER introduces an unsupervised framework for learning whole-slide image representations in computational pathology, leveraging Gaussian Mixture Models to capture morphological prototypes and their prevalence. This approach generates task-agnostic slide embeddings that perform comparably to or better than supervised methods across 13 diverse diagnostic and prognostic tasks, offering inherent interpretability.

View blog

#clustering-algorithms #computer-science #computer-vision-security

Resources 115

RadGame: An AI-Powered Platform for Radiology Education

16 Sep 2025

Harvard Medical School Massachusetts General Hospital

We introduce RadGame, an AI-powered gamified platform for radiology education that targets two core skills: localizing findings and generating reports. Traditional radiology training is based on passive exposure to cases or active practice with real-time input from supervising radiologists, limiting opportunities for immediate and scalable feedback. RadGame addresses this gap by combining gamification with large-scale public datasets and automated, AI-driven feedback that provides clear, structured guidance to human learners. In RadGame Localize, players draw bounding boxes around abnormalities, which are automatically compared to radiologist-drawn annotations from public datasets, and visual explanations are generated by vision-language models for user missed findings. In RadGame Report, players compose findings given a chest X-ray, patient age and indication, and receive structured AI feedback based on radiology report generation metrics, highlighting errors and omissions compared to a radiologist's written ground truth report from public datasets, producing a final performance and style score. In a prospective evaluation, participants using RadGame achieved a 68% improvement in localization accuracy compared to 17% with traditional passive methods and a 31% improvement in report-writing accuracy compared to 4% with traditional methods after seeing the same cases. RadGame highlights the potential of AI-driven gamification to deliver scalable, feedback-rich radiology training and reimagines the application of medical AI resources in education.

View blog

#ai-for-health #computer-science #artificial-intelligence

Resources

When Models Reason in Your Language: Controlling Thinking Language Comes at the Cost of Accuracy

01 Oct 2025

Shan Chen

University of Amsterdam Harvard University logo

Harvard University

Researchers discovered that while Large Reasoning Models (LRMs) can be prompted to generate thinking steps in non-English languages, doing so consistently reduces their accuracy on complex math and science problems. They introduced a new challenging multilingual benchmark, XReasoning, to evaluate this trade-off, showing a clear performance degradation despite improved language alignment from prompt modifications or targeted fine-tuning.

View blog

#chain-of-thought #computer-science #computation-and-language

Resources

133

BehaviorSFT: Behavioral Token Conditioning for Clinical Agents Across the Proactivity Spectrum

27 May 2025

Yubin Kim

University of Washington Google logo

Google

Large Language Models (LLMs) as clinical agents require careful behavioral adaptation. While adept at reactive tasks (e.g., diagnosis reasoning), LLMs often struggle with proactive engagement, like unprompted identification of critical missing information or risks. We introduce BehaviorBench, a comprehensive dataset to evaluate agent behaviors across a clinical assistance spectrum, ranging from reactive query responses to proactive interventions (e.g., clarifying ambiguities, flagging overlooked critical data). Our BehaviorBench experiments reveal LLMs' inconsistent proactivity. To address this, we propose BehaviorSFT, a novel training strategy using behavioral tokens to explicitly condition LLMs for dynamic behavioral selection along this spectrum. BehaviorSFT boosts performance, achieving up to 97.3% overall Macro F1 on BehaviorBench and improving proactive task scores (e.g., from 95.0% to 96.5% for Qwen2.5-7B-Ins). Crucially, blind clinician evaluations confirmed BehaviorSFT-trained agents exhibit more realistic clinical behavior, striking a superior balance between helpful proactivity (e.g., timely, relevant suggestions) and necessary restraint (e.g., avoiding over-intervention) versus standard fine-tuning or explicit instructed agents.

View blog

#agents #ai-for-health #computer-science

Resources

Beyond the Algorithm: A Field Guide to Deploying AI Agents in Clinical Practice

01 Oct 2025

Harvard Medical School Maastricht University

Large language models (LLMs) integrated into agent-driven workflows hold immense promise for healthcare, yet a significant gap exists between their potential and practical implementation within clinical settings. To address this, we present a practitioner-oriented field manual for deploying generative agents that use electronic health record (EHR) data. This guide is informed by our experience deploying the "irAE-Agent", an automated system to detect immune-related adverse events from clinical notes at Mass General Brigham, and by structured interviews with 20 clinicians, engineers, and informatics leaders involved in the project. Our analysis reveals a critical misalignment in clinical AI development: less than 20% of our effort was dedicated to prompt engineering and model development, while over 80% was consumed by the sociotechnical work of implementation. We distill this effort into five "heavy lifts": data integration, model validation, ensuring economic value, managing system drift, and governance. By providing actionable solutions for each of these challenges, this field manual shifts the focus from algorithmic development to the essential infrastructure and implementation work required to bridge the "valley of death" and successfully translate generative AI from pilot projects into routine clinical care.

View blog

#agentic-frameworks #agents #ai-for-health

Resources

Tree of Attributes Prompt Learning for Vision-Language Models

21 Apr 2025

Harvard University Mass General Brigham

Prompt learning has proven effective in adapting vision language models for downstream tasks. However, existing methods usually append learnable prompt tokens solely with the category names to obtain textual features, which fails to fully leverage the rich context indicated in the category name. To address this issue, we propose the Tree of Attributes Prompt learning (TAP), which first instructs LLMs to generate a tree of attributes with a "concept - attribute - description" structure for each category, and then learn the hierarchy with vision and text prompt tokens. Unlike existing methods that merely augment category names with a set of unstructured descriptions, our approach essentially distills structured knowledge graphs associated with class names from LLMs. Furthermore, our approach introduces text and vision prompts designed to explicitly learn the corresponding visual attributes, effectively serving as domain experts. Additionally, the general and diverse descriptions generated based on the class names may be wrong or absent in the specific given images. To address this misalignment, we further introduce a vision-conditional pooling module to extract instance-specific text features. Extensive experimental results demonstrate that our approach outperforms state-of-the-art methods on the zero-shot base-to-novel generalization, cross-dataset transfer, as well as few-shot classification across 11 diverse datasets. Code is available at this https URL

View blog

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Resources

122

Multimodal Prototyping for cancer survival prediction

28 Jun 2024

Harvard Medical School Johns Hopkins University School of Medicine

Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and transcriptomic profiles are particularly promising for patient prognostication and stratification. Current approaches involve tokenizing the WSIs into smaller patches (>10,000 patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes. However, this process generates many tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses. Instead, we hypothesize that we can: (1) effectively summarize the morphological content of a WSI by condensing its constituting tokens using morphological prototypes, achieving more than 300x compression; and (2) accurately characterize cellular functions by encoding the transcriptomic profile with biological pathway prototypes, all in an unsupervised fashion. The resulting multimodal tokens are then processed by a fusion network, either with a Transformer or an optimal transport cross-alignment, which now operates with a small and fixed number of tokens without approximations. Extensive evaluation on six cancer types shows that our framework outperforms state-of-the-art methods with much less computation while unlocking new interpretability analyses.

View blog

#ai-for-health #computer-science #computer-vision-security

Resources 69

WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation

16 Oct 2024

MIT Maastricht University

Multimodal/vision language models (VLMs) are increasingly being deployed in healthcare settings worldwide, necessitating robust benchmarks to ensure their safety, efficacy, and fairness. Multiple-choice question and answer (QA) datasets derived from national medical examinations have long served as valuable evaluation tools, but existing datasets are largely text-only and available in a limited subset of languages and countries. To address these challenges, we present WorldMedQA-V, an updated multilingual, multimodal benchmarking dataset designed to evaluate VLMs in healthcare. WorldMedQA-V includes 568 labeled multiple-choice QAs paired with 568 medical images from four countries (Brazil, Israel, Japan, and Spain), covering original languages and validated English translations by native clinicians, respectively. Baseline performance for common open- and closed-source models are provided in the local language and English translations, and with and without images provided to the model. The WorldMedQA-V benchmark aims to better match AI systems to the diverse healthcare environments in which they are deployed, fostering more equitable, effective, and representative applications.

View blog

#ai-for-health #computer-science #computation-and-language

Resources

Sparse Autoencoder Features for Classifications and Transferability

17 Feb 2025

Shan Chen

Harvard University Johns Hopkins University logo

Johns Hopkins University

This research systematically investigates Sparse Autoencoders (SAEs) for extracting interpretable features from Gemma 2 models, establishing benchmarks for SAE-based classification and evaluating their transferability across languages and modalities. It demonstrates that SAE-derived features consistently outperform baselines for various classification tasks.

View blog

#computer-science #artificial-intelligence #computation-and-language

Resources 152

228

An Agentic AI Workflow for Detecting Cognitive Concerns in Real-world Data

03 Feb 2025

Harvard Medical School Massachusetts General Hospital

Early identification of cognitive concerns is critical but often hindered by subtle symptom presentation. This study developed and validated a fully automated, multi-agent AI workflow using LLaMA 3 8B to identify cognitive concerns in 3,338 clinical notes from Mass General Brigham. The agentic workflow, leveraging task-specific agents that dynamically collaborate to extract meaningful insights from clinical notes, was compared to an expert-driven benchmark. Both workflows achieved high classification performance, with F1-scores of 0.90 and 0.91, respectively. The agentic workflow demonstrated improved specificity (1.00) and achieved prompt refinement in fewer iterations. Although both workflows showed reduced performance on validation data, the agentic workflow maintained perfect specificity. These findings highlight the potential of fully automated multi-agent AI workflows to achieve expert-level accuracy with greater efficiency, offering a scalable and cost-effective solution for detecting cognitive concerns in clinical settings.

View blog

#agentic-frameworks #agents #ai-for-health

Resources

Multistain Pretraining for Slide Representation Learning in Pathology

05 Aug 2024

Harvard Medical School Mass General Brigham

Developing self-supervised learning (SSL) models that can learn universal and transferable representations of H&E gigapixel whole-slide images (WSIs) is becoming increasingly valuable in computational pathology. These models hold the potential to advance critical tasks such as few-shot classification, slide retrieval, and patient stratification. Existing approaches for slide representation learning extend the principles of SSL from small images (e.g., 224 x 224 patches) to entire slides, usually by aligning two different augmentations (or views) of the slide. Yet the resulting representation remains constrained by the limited clinical and biological diversity of the views. Instead, we postulate that slides stained with multiple markers, such as immunohistochemistry, can be used as different views to form a rich task-agnostic training signal. To this end, we introduce Madeleine, a multimodal pretraining strategy for slide representation learning. Madeleine is trained with a dual global-local cross-stain alignment objective on large cohorts of breast cancer samples (N=4,211 WSIs across five stains) and kidney transplant samples (N=12,070 WSIs across four stains). We demonstrate the quality of slide representations learned by Madeleine on various downstream evaluations, ranging from morphological and molecular classification to prognostic prediction, comprising 21 tasks using 7,299 WSIs from multiple medical centers. Code is available at https://github.com/mahmoodlab/MADELEINE.

View blog

#ai-for-health #computer-science #artificial-intelligence

Resources

Composable Interventions for Language Models

16 Mar 2025

University of Oxford Microsoft logo

Microsoft

Test-time interventions for language models can enhance factual accuracy, mitigate harmful outputs, and improve model efficiency without costly retraining. But despite a flood of new methods, different types of interventions are largely developing independently. In practice, multiple interventions must be applied sequentially to the same model, yet we lack standardized ways to study how interventions interact. We fill this gap by introducing composable interventions, a framework to study the effects of using multiple interventions on the same language models, featuring new metrics and a unified codebase. Using our framework, we conduct extensive experiments and compose popular methods from three emerging intervention categories -- Knowledge Editing, Model Compression, and Machine Unlearning. Our results from 310 different compositions uncover meaningful interactions: compression hinders editing and unlearning, composing interventions hinges on their order of application, and popular general-purpose metrics are inadequate for assessing composability. Taken together, our findings showcase clear gaps in composability, suggesting a need for new multi-objective interventions. All of our code is public: this https URL

View blog

#computer-science #computation-and-language #machine-learning

Resources

100

Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

13 Mar 2025

Imperial College London Fudan University logo

Fudan University

Cardiovascular magnetic resonance (CMR) imaging offers diverse contrasts for non-invasive assessment of cardiac function and myocardial characterization. However, CMR often requires the acquisition of many contrasts, and each contrast takes a considerable amount of time. The extended acquisition time will further increase the susceptibility to motion artifacts. Existing deep learning-based reconstruction methods have been proven to perform well in image reconstruction tasks, but most of them are designed for specific acquisition modality or dedicated imaging parameter, which limits their ability to generalize across a variety of scan scenarios. To address this issue, the CMRxRecon2024 challenge consists of two specific tasks: Task 1 focuses on a modality-universal setting, evaluating the out-of-distribution generalization of existing learning-based models, while Task 2 follows a k-space sampling-universal setting, assessing the all-in-one adaptability of universal models. Main contributions of this challenge include providing the largest publicly available multi-modality, multi-view cardiac k-space dataset; and developing an open benchmarking platform for algorithm evaluation and shared code library for data processing. In addition, through a detailed analysis of the results submitted to the challenge, we have also made several findings, including: 1) adaptive prompt-learning embedding is an effective means for achieving strong generalization in reconstruction models; 2) enhanced data consistency based on physics-informed networks is also an effective pathway toward a universal model; 3) traditional evaluation metrics have limitations when assessing ground-truth references with moderate or lower image quality, highlighting the need for subjective evaluation methods. This challenge attracted 200 participants from 18 countries, aimed at promoting their translation into clinical practice.

View blog

#image-and-video-processing #electrical-engineering

Resources

Transcriptomics-guided Slide Representation Learning in Computational Pathology

19 May 2024

Harvard University Mass General Brigham

This research introduces TANGLE, a self-supervised learning framework that guides the creation of high-quality, generalized whole-slide image embeddings using paired transcriptomics data. The framework, applied to both human cancer and rat toxicological pathology, achieved superior performance in few-shot classification, slide retrieval, and demonstrated biologically interpretable representations, including the development of iBOT-Tox, the first large-scale SSL model for rodent histology.

View blog

#computer-science #contrastive-learning #artificial-intelligence

Resources 92

125

There are no more papers matching your filters at the moment.

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Dark mode

Ask or search anything...

Events