Add research and standalone tasks by ScuttleBot · Pull Request #327 · pinchbench/skill

ScuttleBot · 2026-04-14T14:11:05Z

Add 12 research and standalone tasks to PinchBench.

Tasks Added

task_codebase_navigation — Navigate unfamiliar codebase to find where auth is handled (Closes [task-proposal] Codebase Navigation #143)
task_deep_research — Research WebAssembly outside the browser with primary source citations (Closes [task-proposal] Deep Research #145)
task_competitive_research — Compare GitHub Copilot, Cursor, and Kilo Code (Closes [task-proposal] Competitive Research #146)
task_oss_alternative_research — Find open source alternatives to Notion for self-hosting (Closes [task-proposal] Open Source Alternative Research #147)
task_video_transcript_extraction — Extract YouTube transcript and create structured summary (Closes [task-proposal] Video Transcript Extraction #157)
task_browser_automation — Write Playwright e2e test for a shopping cart HTML page (Closes [task-proposal] Browser Automation #158)
task_pricing_research — Compare managed PostgreSQL pricing across 5 providers (Closes [task-proposal] Pricing Research #163)
task_it_procurement — Research developer laptops for a 50-person startup (Closes [task-proposal] IT Equipment Procurement #164)
task_eu_regulation_research — EU AI Act compliance briefing for AI developer tools (Closes [task-proposal] EU Regulation Research #165)
task_byok_best_practices — Best practices guide for BYOK in AI inference apps (Closes [task-proposal] BYOK Best Practices #166)
task_cron_organizer — Convert natural language to cron expressions (Closes [task-proposal] Cron Job Organizer #167)
task_subway_navigation — Plan NYC subway route from text-based map (Closes [task-proposal] Subway Navigation #168)

Notes

Research tasks use timeout: 300s to allow for web search and report composition
browser_automation includes an embedded HTML shopping cart asset (shop.html)
subway_navigation uses a text-based subway map (subway_map.md) instead of an image for broad agent compatibility
cron_organizer has fully automated grading with exact cron expression matching
All tasks include detailed grading criteria and rubrics
Lint passes: python3 scripts/lint_manifest.py → OK (65 tasks)

kilo-code-bot · 2026-04-14T14:11:55Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

This PR is well-structured — each task file has clear prompts, grading criteria, automated checks, and LLM judge rubrics. The manifest entries are consistent with the existing pattern.

Files Reviewed (13 files)

tasks/manifest.yaml
tasks/task_browser_automation.md
tasks/task_byok_best_practices.md
tasks/task_codebase_navigation.md
tasks/task_competitive_research.md
tasks/task_cron_organizer.md
tasks/task_deep_research.md
tasks/task_eu_regulation_research.md
tasks/task_it_procurement.md
tasks/task_oss_alternative_research.md
tasks/task_pricing_research.md
tasks/task_subway_navigation.md
tasks/task_video_transcript_extraction.md

_{Reviewed by claude-4.6-sonnet-20260217 · 146,378 tokens}

ScuttleBot · 2026-04-15T14:03:55Z

🧪 Test Started

Branch: tasks/research-standalone
Triggered by: ScuttleBot (automated PR testing)
Time: 2026-04-15 14:03 UTC

Instances

Instance	IP	Model
pr327-test-1	`66.42.84.134`	`openrouter/anthropic/claude-opus-4.6`
pr327-test-2	`144.202.21.233`	`openrouter/openai/gpt-5.4`
pr327-test-3	`155.138.235.245`	`openrouter/google/gemini-3.1-pro-preview`

Tasks (12 new)

task_codebase_navigation,task_deep_research,task_competitive_research,task_oss_alternative_research,task_video_transcript_extraction,task_browser_automation,task_pricing_research,task_it_procurement,task_eu_regulation_research,task_byok_best_practices,task_cron_organizer,task_subway_navigation

Plan

Running all 3 models in parallel on separate Vultr instances (vc2-2c-4gb, ATL)
Using --suite filter to run only the 12 new PR tasks
Using --no-upload (unofficial test run)
ETA: ~45-60 minutes (research tasks have 300s timeouts)

Will post results summary when all runs complete.

Add research and standalone tasks

531d939

Category	Tasks	Grading
Research	deep_research, competitive_research, oss_alternative_research, pricing_research, it_procurement, eu_regulation_research, byok_best_practices, video_transcript_extraction	llm_judge
Developer	codebase_navigation, browser_automation	hybrid
Productivity	cron_organizer	automated
Navigation	subway_navigation	llm_judge

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add research and standalone tasks#327

Add research and standalone tasks#327
ScuttleBot wants to merge 1 commit intomainfrom
tasks/research-standalone

ScuttleBot commented Apr 14, 2026

Uh oh!

kilo-code-bot bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

ScuttleBot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ScuttleBot commented Apr 14, 2026

Tasks Added

Categories

Notes

Uh oh!

kilo-code-bot bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

ScuttleBot commented Apr 15, 2026

🧪 Test Started

Instances

Tasks (12 new)

Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kilo-code-bot bot commented Apr 14, 2026 •

edited

Loading