IHC-driven computational pathology benchmark

An Open Benchmark for IHC-driven Computational Pathology

A comprehensive benchmarking framework for evaluating foundation models across diverse immunohistochemistry tasks in precision oncology.

Explore Tasks View Models Browse Datasets Leaderboard Submit Your Models

Scale

26,058 WSIs

Patients

13,657 Cases

Patches

464K+

Multi-Center

13 Institutions

Pan-Cancer

27 Organs

Multi-Task

71 Tasks

Scale

26,058 WSIs

Patients

13,657 Cases

Patches

464K+

Multi-Center

13 Institutions

Pan-Cancer

27 Organs

Multi-Task

71 Tasks

Foundation Models

Clinical Stains

Proteins

14.7

IHC Images

9.6

M+

Patients

13.6

Evaluation Scope · Clinical Tasks

IHC Staining Assessment

Evaluating spatial patterns, intensity, subcellular location, and quantity.

IHC

3 tasks

Biomarker Expression

Predicting marker-specific protein signals across diverse heterogeneous tissue contexts.

IHC

33 in-domain / 12 OOD

Diagnosis & Grading

Robust recognition of histologic patterns across tumor subtypes and progression states.

HE IHC

6 tasks

Microenvironment

Fine-grained tissue composition classification and spatial context from IHC inputs.

IHC

10 tasks

Progression & Prognosis

Clinical risk stratification, survival analysis and time-to-event outcomes.

HE IHC

9 tasks

Therapeutic Response

Predicting treatment efficacy across multiple settings, including chemotherapy, targeted therapy, and immunotherapy.

HE IHC

10 tasks

Benchmark Snapshot · Overall Model Rankings

Full task explorer →

Model

Overall Mean Rank
(Lower is better)

4 6 8 10 12

Wins
(#1)

Top-3
(count)

1 Virchow2

4.16

2 UNI

5.98

3 GigaPath

6.00

4 GPFM

6.19

5 CONCH

6.36

6 MADELEINE

6.48

7 TITAN

6.71

8 CONCH v1.5

6.80

9 H-optimus-0

7.39

10 Phikon

7.71

11 Virchow

8.28

12 CTransPath

8.36

13 CHIEF

9.48

14 Prov-GigaPath

11.66

Mean Rank

Bubble size: Wins (#1 finishes)

Bubble color: Top-3 count

Motivation · Why ImmunoBench?

A Critical Blind Spot

Current models focus on H&E, missing crucial IHC signals needed for real-world clinical decisions. ImmunoBench establishes IHC-centered evaluation as a necessary dimension.

Task-Dependent Landscape

No single model dominates. While models excel at local IHC signals, complex clinical endpoints remain challenging, requiring better integration of spatial and molecular context.

An Open Ecosystem

More than a benchmark, ImmunoBench is a durable resource offering curated datasets, standardized protocols, and a dynamic leaderboard to drive future model development.

Key Findings · Benchmark Insights

Performance Boundaries

Models excel at spatial patterns but struggle with sparse signals, outcome prediction, and cross-center generalization.

Architectural Impact

Patch-level models dominate local recognition. IHC-aware and multimodal pretraining provide selective, not universal, benefits.

Multi-Stain Integration

Simply combining H&E and IHC is insufficient. Ensembles reveal current models encode complementary but incomplete information.