INT: Instance-Specific Negative Mining for Task-Generic Promptable Segmentation
Abstract
Task-generic promptable image segmentation aims to achieve segmentation of diverse samples under a single task description by utilizing only one task-generic prompt. Current methods leverage the generalization capabilities of Vision-Language Models (VLMs) to infer instance-specific prompts from these task-generic prompts in order to guide the segmentation process. However, when VLMs struggle to generalise to some image instances, predicting instance-specific prompts becomes poor. To solve this problem, we introduce Instance-specific Negative Mining for Task-Generic Promptable Segmentation (INT). The key idea of INT is to adaptively reduce the influence of irrelevant (negative) prior knowledge whilst to increase the use the most plausible prior knowledge, selected by negative mining with higher contrast, in order to optimise instance-specific prompts generation. Specifically, INT consists of two components: (1) instance-specific prompt generation, which progressively fliters out incorrect information in prompt generation; (2) semantic mask generation, which ensures each image instance segmentation matches correctly the semantics of the instance-specific prompts. INT is validated on six datasets, including camouflaged objects and medical images, demonstrating its effectiveness, robustness and scalability.
Community
In this paper, we employ hard negative mining to enable promptable segmentation to overcome the limitations of manually annotated prompts in unlabeled scenarios.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- TSPE: Task-Specific Prompt Ensemble for Improved Zero-Shot Audio Classification (2024)
- TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning (2024)
- MoPD: Mixture-of-Prompts Distillation for Vision-Language Models (2024)
- Guiding Medical Vision-Language Models with Explicit Visual Prompts: Framework Design and Comprehensive Exploration of Prompt Variations (2025)
- Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation (2024)
- SAM-Aware Graph Prompt Reasoning Network for Cross-Domain Few-Shot Segmentation (2024)
- Image Augmentation Agent for Weakly Supervised Semantic Segmentation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper