Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Haoxuan Han^* · Weijie Wang^*† · Zeyu Zhang · Yefei He · Bohan Zhuang

^*Equal Contribution ^†Project Lead

Paper | Project Page | Code

Recent advancements in Vision-Language Models (VLMs) have significantly pushed the boundaries of Visual Question Answering (VQA). However, high-resolution details can sometimes become noise that leads to hallucinations or reasoning errors. We propose Degradation-Driven Prompting (DDP), a novel framework that improves VQA performance by strategically reducing image fidelity and utilizing an agentic tool-use pipeline to force models to focus on essential structural information.

Method

Overview of Degradation-Driven Prompting (DDP). Given an image and a question as input, our DDP framework introduces a "divide-and-conquer" workflow consisting of three stages:

The Classifier categorizes the image type and visual task.
The Tool-manager invokes specialized visual tools (e.g., draw rectangle, crop, blur masks, grid auxlines) to highlight suspicious regions and intentionally degrade distracting textures.
The Critic synthesizes these visual cues, corrects initial misconceptions bridging the perception-logic gap, and provides the final reasoned answer. This active agentic perception approach allows VLMs to bypass deceiving high-frequency textures and achieve superior reasoning accuracy on challenging visual benchmarks.

Project Structure

DDP/
├── asset/                  # Test data and images
│   ├── Data/               # Task datasets
│   │   ├── task1/ #Only csv of questions
│   │   └── task2/ #Only json of questions,you can download the image set from official website of the workshop
│   ├── TeT benchmark/  
│   ├── V_star bench/ 
│   
├── src/                    # Source code
│   ├── task1.py            # Task 1 solver (Optical illusions / Visual perception)
│   ├── task2.py            # Task 2 solver (Counting, Color blindness, Geometry, Real scenes, etc.)
│   ├── helper.py           # Base solver definitions
│   ├── mcp_server.py       # MCP Server (Optional, use with an MCP Client)
│  
├── .gitignore
├── requirements.txt
└── README.md

Setup

pip install -r requirements.txt

This is a professional and structured addition for your README.md. I have organized it to clearly distinguish between the two tasks while highlighting the strict "No Training" and "Institutional Email" requirements.

CVPR 2026 DataCV Challenge

This project is part of the 5th DataCV Challenge, held in conjunction with the CVPR 2026 DataCV Workshop. The challenge focuses on the data-centric evaluation of Vision-Language Models (VLMs) under visual illusions and perceptual anomalies.

Challenge Overview

The core objective is to improve VLM robustness without any model training or fine-tuning. Participants must rely on prompting, in-context learning (ICL), and inference-time strategies using frozen, off-the-shelf models.

Task I: Classic Illusion Understanding

Goal: Design a strategy to enable a fixed VLM to answer binary (Yes/No) questions about classic optical illusions.

Input: Illusion image + Binary question.
Core Constraint: Perception-focused. Submissions must not use measurement- or computation-based pipelines (e.g., explicit length/angle estimation, ruler-based quantification, or pixel-level statistics).
Allowed: Basic adjustments like resizing or standard normalization that do not produce quantitative measurements.

Task II: Real-world Visual Illusions and Anomalies

Goal: Design a strategy for a VLM to answer multiple-choice questions (A, B, C, D) based on real-world visual anomalies.

Input: Image + Multiple-choice prompt.
Strategy: Any form of prompting or inference-time strategy is allowed, provided the model remains frozen.

Common Constraints & Rules

To ensure a valid entry, all participants must adhere to the following:

No Training/Fine-tuning: Strictly no gradient updates or weight changes are permitted ($0$ parameters updated).
Model Selection: Only off-the-shelf, publicly released models (e.g., GPT-4, Claude, Qwen, LLaVA) are allowed.
Inference Only: Only zero-shot or few-shot inference-time methods (Prompting, ICL) are permitted.

Quick Start: Run Task Code Directly

task1.py and task2.py are complete scripts that can run independently without the MCP Server. They invoke the LLM API via an OpenAI-compatible interface to implement a three-stage pipeline: "Classification → Tool Usage → Final Reasoning".

Configurations Required Before Running

At the bottom of src/task1.py and src/task2.py in the __main__ area, you need to fill in the following information yourself:

1. API Key

# Supports multiple keys for concurrency, fill in your own keys
my_api_keys = ["sk-your-key-1", "sk-your-key-2", ...]

2. API Base URL

# Must be an OpenAI-compatible interface address (ending with /v1)
# The code will automatically append /chat/completions
base_url = "https://your-api-provider.com/v1"

Note: The current code uses the OpenAI Chat Completion interface format (/v1/chat/completions), including the Vision multimodal message structure. If you are using official SDKs from other models like Gemini or Claude, you need to call them through an OpenAI-compatible relay service, or modify the call_openai_api() function yourself.

3. File Paths

# Input data path (CSV or JSON)
input_csv = "../asset/test.csv"       # task1
input_json_file = "path/to/test.json" # task2

# Output result path
output_txt = "path/to/result.txt"

4. Model Name

model_name = "gemini-3.1-pro-preview"  # Fill in according to your API provider's supported models

Run

cd src
python task1.py   # Run Task 1
python task2.py   # Run Task 2

Differences between Task 1 and Task 2

	Task 1	Task 2
Task Categories	4 types: color, size, line, other	8 types: counting, spot-the-difference, color blindness, dynamic illusions, geometry, real scenes, size, other
Answer Format	Binary judgment 0/1 (Yes/No)	Multiple choice A/B/C/D
Image Processing Tools	whitemask, gridmask, crop, binary masks, edge/contrast enhance, equal-spacing lines	whitemask, gridmask, crop, draw_rectangle, reversed_blur_mask, enhance_contrast
Data Format	CSV (`image_path`, `prompt`)	JSON or Parquet (contains embedded images)

Workflow

Both Tasks share the same three-stage DDP architecture:

Input Image + Question
      │
      ▼
┌─────────────┐
│1. Classifier│  Categorizes the image type and visual task (e.g., real picture, optical illusion).
└──────┬──────┘
       ▼
┌─────────────┐
│2. Tool-     │  Invokes specialized visual tools based on the category (e.g., draw rectangle, crop)
│   manager   │  → Code executes the tool function to degenerate textures and highlight explicit geometry.
└──────┬──────┘
       ▼
┌─────────────┐
│3. Critic    │  Synthesizes visual cues, detects mismatches, corrects misconceptions, and provides the
│             │  final reasoned answer bridging the perception-logic gap.
└─────────────┘

Proxy Settings (Optional)

task1.py contains a proxy configuration. If your network environment does not require a proxy, please comment out or remove the proxies parameter in call_openai_api():

# Remove or comment out this section
proxies={
    "http": "http://YOUR_PROXY_IP:PORT",
    "https": "http://YOUR_PROXY_IP:PORT",
}

MCP Server (Optional)

Prerequisite: You need a client that supports the MCP protocol (such as Claude Desktop, Claude Code, or another MCP Client).

src/mcp_server.py extracts all 12 image processing tools from task1 and task2 into independent MCP services, exposing them to the MCP Client through the stdio transport protocol.

What is this for?

The MCP Server allows you to call these image processing tools interactively within an MCP Client. When combined with the prompts defined in the code (such as toolusage_1, toolusage_2, toolusage_3, etc.), you can manually debug and analyze individual images without running the entire batch pipeline.

Typical Use Cases:

Execute a toolchain step-by-step on a single image in Claude Desktop to observe the processing effect at each step
Debug the tool selection strategy and parameters a specific category
Expose tool capabilities to other MCP-compatible AI Agents

Tool List

Tool	Description
`resize_image`	Scale the image to the specified maximum dimension
`whitemask`	Keep the specified circular area, turn the rest white
`gridmask`	Draw vertical/horizontal/polar coordinate grid lines
`crop_image`	Crop to one or more bounding boxes
`near_white_to_binary`	Binarization of near-white pixels
`near_red_to_binary`	Extract near-red pixels as a black and white image
`laplacian_edge_enhance`	Laplacian edge enhancement
`enhance_luminance_contrast`	Strong luminance and contrast enhancement (CLAHE + S-curve)
`draw_equal_spacing_lines`	Draw three equidistant vertical guide lines
`reversed_blur_mask`	Selective blur (inside/outside a circle)
`enhance_contrast`	CLAHE contrast enhancement
`draw_rectangle`	Draw a highlighted rectangular box

Start MCP Server

python src/mcp_server.py

Claude Desktop Configuration

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "image-tools": {
      "command": "python",
      "args": ["D:/DDP/src/mcp_server.py"]
    }
  }
}

Claude Code Configuration

Add to .mcp.json in the project root directory:

{
  "mcpServers": {
    "image-tools": {
      "command": "python",
      "args": ["src/mcp_server.py"]
    }
  }
}

Important Note

The MCP Server only provides the image processing tools themselves. To achieve the exact same results as the complete pipeline in task1/task2, you need to use these tools in your MCP Client alongside the classification prompts and tool invocation prompts defined in the code. These prompts are defined in the solve() methods of task1.py / task2.py (e.g., classifyprompt, toolusage_1, class1prompt, etc.).

Citation

If you find our work useful for your research, please consider citing us:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Paper | Project Page | Code

Method

Project Structure

Setup

CVPR 2026 DataCV Challenge

Challenge Overview

Task I: Classic Illusion Understanding

Task II: Real-world Visual Illusions and Anomalies

Common Constraints & Rules

Quick Start: Run Task Code Directly

Configurations Required Before Running

1. API Key

2. API Base URL

3. File Paths

4. Model Name

Run

Differences between Task 1 and Task 2

Workflow

Proxy Settings (Optional)

MCP Server (Optional)

What is this for?

Tool List

Start MCP Server

Claude Desktop Configuration

Claude Code Configuration

Important Note

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
asset		asset
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Less Detail, Better Answers: Degradation-Driven Prompting for VQA

Paper | Project Page | Code

Method

Project Structure

Setup

CVPR 2026 DataCV Challenge

Challenge Overview

Task I: Classic Illusion Understanding

Task II: Real-world Visual Illusions and Anomalies

Common Constraints & Rules

Quick Start: Run Task Code Directly

Configurations Required Before Running

1. API Key

2. API Base URL

3. File Paths

4. Model Name

Run

Differences between Task 1 and Task 2

Workflow

Proxy Settings (Optional)

MCP Server (Optional)

What is this for?

Tool List

Start MCP Server

Claude Desktop Configuration

Claude Code Configuration

Important Note

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages