Add native OCR to screenshot editor#1799
Conversation
| let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( | ||
| &buffer, | ||
| BitmapPixelFormat::Bgra8, | ||
| width, | ||
| height, | ||
| BitmapAlphaMode::Premultiplied, | ||
| ) |
There was a problem hiding this comment.
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but
BitmapAlphaMode::Premultiplied tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. BitmapAlphaMode::Straight (or BitmapAlphaMode::Ignore since OCR doesn't need transparency) is the correct choice.
| let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( | |
| &buffer, | |
| BitmapPixelFormat::Bgra8, | |
| width, | |
| height, | |
| BitmapAlphaMode::Premultiplied, | |
| ) | |
| let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer( | |
| &buffer, | |
| BitmapPixelFormat::Bgra8, | |
| width, | |
| height, | |
| BitmapAlphaMode::Straight, | |
| ) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: apps/desktop/src-tauri/src/screenshot_editor.rs
Line: 1099-1105
Comment:
The BGRA pixel data copied from the source is straight (non-premultiplied) alpha, but `BitmapAlphaMode::Premultiplied` tells Windows the RGB channels have already been multiplied by alpha. If the screenshot contains any semi-transparent pixels, the OCR engine may internally un-premultiply the RGB values (dividing by alpha), producing inflated and incorrect colour values. `BitmapAlphaMode::Straight` (or `BitmapAlphaMode::Ignore` since OCR doesn't need transparency) is the correct choice.
```suggestion
let bitmap = SoftwareBitmap::CreateCopyWithAlphaFromBuffer(
&buffer,
BitmapPixelFormat::Bgra8,
width,
height,
BitmapAlphaMode::Straight,
)
```
How can I resolve this? If you propose a fix, please make it concise.|
please re-review the pr @greptileai. we re-did it so that the ocr is now fully selectable/clickable on the actual screenshot. |
| createEffect(() => { | ||
| const key = sourceRegionKey(); | ||
| const region = sourceRegion(); | ||
| requestId += 1; | ||
| const currentRequestId = requestId; | ||
|
|
||
| if (!key || !region) { | ||
| setOcrResult(null); | ||
| return; | ||
| } |
There was a problem hiding this comment.
OCR currently runs every time sourceRegionKey() changes even when the user is in a drawing tool. If you only need selectable OCR text in "select" mode, gating the effect avoids a bunch of redundant native OCR work (especially while the crop is being dragged).
| createEffect(() => { | |
| const key = sourceRegionKey(); | |
| const region = sourceRegion(); | |
| requestId += 1; | |
| const currentRequestId = requestId; | |
| if (!key || !region) { | |
| setOcrResult(null); | |
| return; | |
| } | |
| createEffect(() => { | |
| const tool = activeTool(); | |
| const key = sourceRegionKey(); | |
| const region = sourceRegion(); | |
| requestId += 1; | |
| const currentRequestId = requestId; | |
| if (tool !== "select") { | |
| setOcrResult(null); | |
| return; | |
| } | |
| if (!key || !region) { | |
| setOcrResult(null); | |
| return; | |
| } |
|
|
||
| void (async () => { | ||
| try { | ||
| const result = await invoke<ScreenshotOcrResult>( |
There was a problem hiding this comment.
Minor maintainability thing: since recognize_screenshot_text is a new specta command, it would be nice to avoid raw invoke + duplicated ScreenshotOcr* types here. If bindings can be regenerated, prefer going through the generated commands API (and generated types); otherwise consider extracting a small wrapper (and these types) into a non-generated module so signature/type changes don’t silently drift.
Adds native OCR to the screenshot editor using macOS Vision and Windows Media OCR.
Selected regions are cropped from the source image and processed off the UI path before copying recognized text.
Validated with Rust, Biome, diff checks, and a Windows OCR API compile check.
Greptile Summary
This PR introduces native OCR to the screenshot editor, automatically running macOS Vision or Windows Media OCR on the full visible image region and overlaying transparent, selectable text spans so users can copy text directly from screenshots. It also ships annotation copy/paste (Cmd+C/V with cascading 16 px offsets), pinch-to-zoom gesture support, a
calculateImageTransformrefactor that correctly propagatesaspectRatiothrough the layout pipeline, and size-awareCameraIssueOverlayimprovements.screenshot_editor.rs): crops the RGBA source buffer to BGRA in a checked, row-by-row loop, then dispatches off-thread to Vision (cidre::vn) on macOS orWindows.Media.Ocron Windows; WinRT is initialised/uninitialised per call via a RAII guard; the line-bounds returned by each engine are offset back into full-image coordinates before serialisation.OcrSelectionOverlay.tsx): reacts to crop changes via a memoised region key, discards stale responses with an incrementingrequestId, and renders each recognised line as an absolutely-positioned transparent<span>scaled to fit its bounding box; pointer-events are enabled only in\"select\"tool mode.layout.ts):calculateImageTransform/getImageRectnow accept anAspectRatio | nullparameter; an early-exit guard for zero-size inputs and two separate auto vs. fixed-aspect paths were added; all call sites incontext.tsxandPreview.tsxwere updated.Confidence Score: 4/5
Safe to merge on macOS; the Windows OCR path has a known alpha-mode mismatch (flagged in a prior review thread) that may degrade OCR quality on screenshots with semi-transparent pixels.
The Rust OCR implementation, TypeScript overlay component, layout refactor, and copy/paste additions are all structurally sound. The one outstanding concern is on the Windows path in screenshot_editor.rs: the pixel data is straight (non-premultiplied) RGBA converted to BGRA, but SoftwareBitmap is told it is premultiplied, which can cause the OCR engine to mis-interpret colour values for semi-transparent pixels. This was raised in a prior review and is still unaddressed in the current code.
apps/desktop/src-tauri/src/screenshot_editor.rs — specifically the SoftwareBitmap alpha mode used in the Windows OCR path.
Important Files Changed
Comments Outside Diff (1)
apps/desktop/src/routes/screenshot-editor/OcrSelectionOverlay.tsx, line 777-780 (link)invokeused instead of the generated type-safe wrapper.recognize_screenshot_textis registered with#[specta::specta]and added to the invoke handler, so the binding generator should produce a typedcommands.recognizeScreenshotText(...). Using rawinvokemeansScreenshotOcrResultis duplicated locally in the component — any signature change on the Rust side won't be caught at compile time. The same file already importscommandsfrom~/utils/tauriforwriteClipboardString; the bindings need to be regenerated to include the new command and then used here.Prompt To Fix With AI
Reviews (2): Last reviewed commit: "feat(camera): improve preview issue over..." | Re-trigger Greptile