Koko Whisper

A personal desktop voice transcription app powered by Whisper, built with Svelte 5 and Tauri 2.

All transcription runs locally on your Mac using the Whisper large-v3-turbo model with Metal GPU acceleration — no cloud, no API calls, privacy first. Built 100% with Claude Code with human review.

Caution

EARLY STAGE SOFTWARE — DO NOT USE FOR ANYTHING IMPORTANT

This project is in a very early stage of development and is not recommended for use by anyone. There is a real risk of loss of recordings and transcriptions made by the program. Do not rely on this software to preserve any important data.

Expect breaking changes, missing features, and rough edges. macOS only — built exclusively for macOS (Apple Silicon) with no plans to support other platforms.

Warning

We are not accepting pull requests, issues, or external contributions at this time.

Features

Local transcription — Whisper large-v3-turbo (GGML, quantized Q5_0) via Metal GPU, ~547MB model
Dual-track recording — microphone + system audio (ScreenCaptureKit) with independent controls
Global shortcut — Cmd+Shift+R to start/stop recording from anywhere
System tray — lives in the menu bar with live recording timer, no dock icon
Chunked processing — 5-minute intervals with partial transcripts during recording
Session recovery — automatic crash recovery with WAV chunks and manifest files
Clipboard integration — transcription results copied to clipboard automatically
Audio device selection — choose specific mic and system audio devices with live VU meters
Configurable output — transcriptions saved as text files to a folder of your choice

Stack

Layer	Technology
Frontend	Svelte 5 (runes), SvelteKit, TypeScript
Backend	Tauri 2 (Rust)
Transcription	whisper-rs (whisper.cpp bindings) with Metal GPU
Audio capture	cpal (ScreenCaptureKit fork for system audio)
Audio processing	rubato (resampling to 16kHz), hound (WAV I/O)
Model download	reqwest (one-time download from HuggingFace)
Package manager	pnpm

Getting Started

Prerequisites

macOS 13+ (Ventura or later) on Apple Silicon
Rust (stable)
Node.js 22+
pnpm 10+

Quick Start

# 1. Install frontend dependencies
pnpm install

# 2. Run in dev mode
pnpm tauri dev

On first launch, the app will automatically download the Whisper model (~547MB) from HuggingFace.

Commands

pnpm tauri dev            # Run app in dev mode (frontend + Tauri)
pnpm dev                  # Run frontend only (no Tauri window)
pnpm build                # Build frontend for production
pnpm tauri build          # Build the full desktop app
pnpm check                # TypeScript type checking
cargo test --manifest-path src-tauri/Cargo.toml   # Run Rust tests

Project Structure

src/
  routes/
    +page.svelte            # Main page: model download, init, view routing
  lib/
    RecordingView.svelte    # Recording UI: button, timer, partial transcripts
    Settings.svelte         # Device selection, output folder, VU meters
    TranscriptionList.svelte # Recent transcriptions list

src-tauri/src/
  lib.rs                   # App setup: plugins, shortcuts, tray, audio capture loop
  commands.rs              # Tauri command handlers (record, stop, settings, devices)
  audio.rs                 # Audio capture (cpal), resampling (rubato), WAV I/O (hound)
  transcription.rs         # Whisper transcription with hallucination filtering
  model.rs                 # Model download from HuggingFace (one-time)
  state.rs                 # App state: tracks, buffers, session, config
  session.rs               # Session management: manifest, recovery, crash handling
  config.rs                # App config: devices, output folder, persistence
  tray.rs                  # System tray: menu, recording timer, status updates

IDE Setup

VS Code + Svelte + Tauri + rust-analyzer

Privacy

Privacy is a core value of this project. Koko Whisper is designed to work entirely offline — your audio and transcriptions never leave your machine.

All audio is captured and processed locally
Transcription runs on-device via Whisper large-v3-turbo with Metal GPU acceleration
Transcriptions are saved as plain text files to a local folder
Audio chunks (WAV) are stored locally for session recovery
No analytics, no tracking, no accounts, no sign-up

The only external network call in the entire codebase is:

Call	Where	Why
HuggingFace model download	`src-tauri/src/model.rs`	One-time download of the Whisper GGML model (~547MB). After download, everything runs offline.

A Privacy Check workflow runs on every push and pull request, scanning all .ts and .rs source files for external network calls. Any new external call that is not explicitly approved will fail the build.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
.vscode		.vscode
scripts		scripts
src-tauri		src-tauri
src		src
static		static
.gitignore		.gitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
svelte.config.js		svelte.config.js
tsconfig.json		tsconfig.json
vite.config.js		vite.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Koko Whisper

Features

Stack

Getting Started

Prerequisites

Quick Start

Commands

Project Structure

IDE Setup

Privacy

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Koko Whisper

Features

Stack

Getting Started

Prerequisites

Quick Start

Commands

Project Structure

IDE Setup

Privacy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages