KEEP / README.md
Astaxanthin's picture
Update README.md
87ddfb6 verified
---
license: mit
---
# Model Card for KEEP
<!-- Provide a quick summary of what the model is/does. -->
[Preprint](https://arxiv.org/abs/2412.13126) | [Github](https://github.com/MAGIC-AI4Med/KEEP) | [Webpage](https://loiesun.github.io/keep/) | [Cite](#citation)
**KEEP** (**K**nowledg**E**-**E**nhanced **P**athology) is a foundation model designed for cancer diagnosis that integrates disease knowledge into vision-language pre-training. It utilizes a comprehensive disease knowledge graph (KG) containing 11,454 human diseases and 139,143 disease attributes, such as synonyms, definitions, and hierarchical relationships. KEEP reorganizes millions of publicly available noisy pathology image-text pairs into 143K well-structured semantic groups based on the hierarchical relations of the disease KG. By incorporating disease knowledge into the alignment process, KEEP achieves more nuanced image and text representations. The model is validated on 18 diverse benchmarks with over 14,000 whole-slide images (WSIs), demonstrating state-of-the-art performance in zero-shot cancer diagnosis, including an average sensitivity of 89.8% for cancer detection across 7 cancer types. KEEP also excels in subtyping rare cancers, achieving strong generalizability in diagnosing rare tumor subtypes.
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** MAGIC-AI4Med team from Shanghai Jiao Tong University and Shanghai AI Lab.
- **Model type:** Vision-language models (vision encoder: ViT-L/16; text encoder: Bert)
- **Pretrain datasets:** 143K pathology semantic groups, each with a single caption and multiple images.
- **License:** MIT
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** https://github.com/MAGIC-AI4Med/KEEP
- **Paper [optional]:** https://arxiv.org/abs/2412.13126
- **Demo [optional]:** [More Information Needed]
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
## Direct Use
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
```python
from transformers import AutoModel, AutoTokenizer
from torchvision import transforms
from PIL import Image
model = AutoModel.from_pretrained("Astaxanthin/KEEP", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("Astaxanthin/KEEP", trust_remote_code=True)
model.eval()
transform = transforms.Compose([
transforms.Resize(size=224, interpolation=transforms.InterpolationMode.BICUBIC),
transforms.CenterCrop(size=(224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
])
example_image_path = './example.tif'
example_text = ['an H&E image of breast invasive carcinoma.', 'an H&E image of normal tissue.', 'an H&E image of lung adenocarcinoma.']
img_input = transform(Image.open(example_image_path).convert('RGB')).unsqueeze(0)
token_input = tokenizer(example_text,max_length=256,padding='max_length',truncation=True, return_tensors='pt')
img_feature = model.encode_image(img_input)
text_feature = model.encode_text(token_input)
```
## Evaluation
<!-- This section describes the evaluation protocols and provides the results. -->
### Testing Data
<!-- This should link to a Dataset Card if possible. -->
We present benchmark results for a range of representative tasks. A complete set of benchmarks can be found in the [paper](https://arxiv.org/abs/2412.18***). These results will be updated with each new iteration of KEEP.
### Results
#### Zero-shot Cancer Region Segmentation (DICE)
| Models | PLIP[[1]](https://www.nature.com/articles/s41591-023-02504-3) | QuiltNet [[2]](https://proceedings.neurips.cc/paper_files/paper/2023/hash/775ec578876fa6812c062644964b9870-Abstract-Datasets_and_Benchmarks.html) | MI-Zero (Pub) [[3]](https://openaccess.thecvf.com/content/CVPR2023/html/Lu_Visual_Language_Pretrained_Multiple_Instance_Zero-Shot_Transfer_for_Histopathology_Images_CVPR_2023_paper.html) | CONCH [[4]](https://www.nature.com/articles/s41591-024-02856-4) | **KEEP(Ours)** |
|:---------------|--------------:|---------------------:|-------------------------:|-----------------:|------------------:|
| CAMELYON16 | 0.253 | 0.157 | 0.186 | 0.292 | **0.361** |
| PANDA | 0.295 | 0.309 | 0.276 | 0.315 | **0.334** |
| AGGC22 | 0.284 | 0.282 | 0.324 | 0.449 | **0.530** |
#### Zero-shot Cancer Detection (AUROC)
| Models | CHIEF[[1]](https://www.nature.com/articles/s41586-024-07894-z) | PLIP [[2]](https://www.nature.com/articles/s41591-023-02504-3) | QuiltNet [[3]](https://proceedings.neurips.cc/paper_files/paper/2023/hash/775ec578876fa6812c062644964b9870-Abstract-Datasets_and_Benchmarks.html) | MI-Zero (Pub) [[4]](https://openaccess.thecvf.com/content/CVPR2023/html/Lu_Visual_Language_Pretrained_Multiple_Instance_Zero-Shot_Transfer_for_Histopathology_Images_CVPR_2023_paper.html) | CONCH [[5]](https://www.nature.com/articles/s41591-024-02856-4) | KEEP(Ours) |
|:---------------|--------------:|--------------------:|-----------------:|-----------------:|------------------:| -----------------:|
| CPTAC-CM | 0.915 | 0.970 | 0.972 | 0.985 | **0.994** | **0.994** |
| CPTAC-CCRCC | 0.723 | 0.330 | 0.755 | 0.886 | 0.871 | **0.999** |
| CPTAC-PDA | 0.825 | 0.391 | 0.464 | 0.796 | 0.920 | **0.929** |
| CPTAC-UCEC | 0.955 | 0.945 | 0.973 | 0.979 | 0.996 | **0.998** |
| CPTAC-LSCC | 0.901 | 0.965 | 0.966 | 0.910 | **0.987** | 0.983 |
| CPTAC-HNSCC | 0.946 | 0.898 | 0.874 | 0.918 | **0.982** | 0.976 |
| CPTAC-LUAD | 0.891 | 0.988 | 0.991 | 0.981 | 0.999 | **1.000** |
#### Zero-shot Cancer Subtyping (BACC)
| Models | PLIP [[1]](https://www.nature.com/articles/s41591-023-02504-3) | QuiltNet [[2]](https://proceedings.neurips.cc/paper_files/paper/2023/hash/775ec578876fa6812c062644964b9870-Abstract-Datasets_and_Benchmarks.html) | MI-Zero (Pub) [[3]](https://openaccess.thecvf.com/content/CVPR2023/html/Lu_Visual_Language_Pretrained_Multiple_Instance_Zero-Shot_Transfer_for_Histopathology_Images_CVPR_2023_paper.html) | CONCH [[4]](https://www.nature.com/articles/s41591-024-02856-4) | **KEEP(Ours)** |
|:---------------|--------------:|---------------------------:|-------------------------:|-----------------:|------------------:|
| TCGA-BRCA | 0.519 | 0.500 | 0.633 | 0.727 | **0.774** |
| TCGA-NSCLC | 0.699 | 0.667 | 0.753 | 0.901 | **0.902** |
| TCGA-RCC | 0.735 | 0.755 | 0.908 | 0.921 | **0.926** |
| TCGA-ESCA | 0.614 | 0.746 | 0.954 | 0.923 | **0.977** |
| TCGA-BRAIN | 0.361 | 0.346 | 0.361 | 0.453 | **0.604** |
| UBC-OCEAN | 0.343 | 0.469 | 0.652 | **0.674** | 0.661 |
| CPTAC-NSCLC | 0.647 | 0.607 | 0.643 | 0.836 | **0.863** |
| EBRAINS | 0.096 | 0.093 | 0.325 | 0.371 | **0.456** |
### Summary
Validated on 18 diverse benchmarks with more than 14,000 whole slide images (WSIs), KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks. Notably, for cancer detection, KEEP demonstrates an average sensitivity of 89.8% at a specificity of 95.0% across 7 cancer types, significantly outperforming vision-only foundation models and highlighting its promising potential for clinical application. For cancer subtyping, KEEP achieves a median balanced accuracy of 0.456 in subtyping 30 rare brain cancers, indicating strong generalizability for diagnosing rare tumors.
## Citation
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
```
@article{zhou2024keep,
title={A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis},
author={Xiao Zhou, Luoyi Sun, Dexuan He, Wenbin Guan, Ruifen Wang, Lifeng Wang, Xin Sun, Kun Sun, Ya Zhang, Yanfeng Wang, Weidi Xie},
journal={arXiv preprint arXiv:2412.13126},
year={2024}
}
```