thirdeye is a macOS screen capture app with built-in LLM capabilities for live transcription and summary.
The intended way to use this project is as a source-built macOS application. The Python backend runs from the repository virtual environment, and thirdeye.app starts the local services it needs.
Watch the prototype demo on YouTube
This video shows a prototype version of thirdeye being used during a real-time seminar. The final macOS application works similarly, with additional functionality such as local screen capture.
Backstory: I used this for my personal use after needing to attend a Google early career seminar that overlapped with my study schedule. The app lets me record and transcribe meetings so I can review the key information later.
This project might or might not be useful for you depending on your goals.
The reason why I created this tool is because I wished there exist a tool that allows me to join a meeting, get transcript and keep track of what the meeting is about through summaries while not listening to any of the meeting and focus on other things.
Some meetings and sessions does not allow recordings and using AI notetaker such as otter.ai is annoying because it notifies the host. That gives me an idea to create this tool.
I personally uses this tool in several ways:
- Joining informational sessions through isolated docker browser so that it won't distract me while I continue to work on other tasks.
- Joining meetings where I don't need to speak much using my local desktop and I can still mute the meeting and listen to music while still keeping track of what is being discussed in meetings.
- Joining meetings and record transcript for my personal use and get summaries so I don't have to take notes.
This project is created with technologies usage that minimize the cost you have to pay to run it.
I noticed that this app is not working well on Zoom or Teams application or other applications where the Audio is transmitted in a different processes other than the core application. For app like chrome, the audio is transimitted in the same process family, hence it works. It is recommended that you run everything through browser to get the best results of your need.
The reason this is built on Tauri instead of using Swift completely in order to have macOS native UI is because I want to leave this open for other developers who want to use this tool to be able to customize this to fit their operating system by adding additional module.s
The application is built using the following technologies:
- Tauri: it allows me to build a native desktop application using web languages such as React.
- Openclaw: this is mainly used as a gateway for LLM communication. Openclaw allows me to use my
codex subscriptionthrough itscodex cli wrapper. This means I use myopenai subscriptionfor API calls instead of their separateAPI Billingwhich incur extra cost. You can swap out this component if you want to use direct API calls for your LLM provider. In my setup, I am running this throughdockerfor security purpose, but you can run it through anywhere. - ScreenCaptureKit: this is a mac native tool built using swift that allows me to do screen capture in mac.
- Docker: optional; it allows me to setup an isolated desktop that I can control without using any of my local browser for screen capture.
- Deepgram: I use this for
speech-to-textAPI for transcription. Deepgram gives you 200 dollars credit which I think is awesome. It cost about0.46$to do about 1 hour recording. So you can get pretty far with Deepgram API. I find the quality to be acceptable.
- thirdeye.app: macOS app and main user interface, built with Tauri.
- macos-capture-agent: FastAPI agent on
127.0.0.1:8791for local macOS displays, applications, and windows. - controller-api: FastAPI service on
127.0.0.1:8788for job lifecycle, Deepgram relay, transcript rebroadcast, summaries, artifacts, and recovery. - desktop: optional on-demand Dockerized Chromium desktops with KasmVNC, a desktop control API, and the
ffmpegcapture scripts. Each desktop is created only when requested and binds loopback-only dynamic ports. - openclaw: optional Docker helper on
127.0.0.1:18789that act as a gateway for LLM calls.
Recording and live transcription are decoupled:
- Pipeline A records the X11 display and Pulse monitor to MP4 inside the selected on-demand desktop.
- Pipeline A can alternatively record a macOS display, application, or window through ScreenCaptureKit via
macos-capture-agent. - Pipeline B captures monitor audio, converts it to
16kmono PCM, and streams it to Deepgram fromcontroller-api.
Install these on the host machine before starting setup:
| Prerequisite | Why it is needed | Notes |
|---|---|---|
| Docker Desktop or Docker Engine | Runs on-demand isolated desktops and optional OpenClaw helpers | Docker is required only if you use those features |
| GNU Make | Wraps the common build and run commands | Used by the repo Makefile |
| Bash-compatible shell | Required by the repo scripts | |
| Python 3.12+ | Creates the repository .venv and runs the local FastAPI services |
Python 3.14 is recommended for local development |
| Node.js 20.19+ | Runs the Tauri frontend tooling | .nvmrc pins the source-built app version |
| npm | Installs frontend dependencies | Ships with Node.js |
| Rust toolchain | Builds the macOS app shell | Required for make macos-app-dev and make macos-app-build |
| Xcode command line tools | Builds Swift and macOS app components | Required for ScreenCaptureKit helper builds, including app dev/build targets |
| Localhost ports | Exposes the local services | 8788, 8791, optional on-demand desktop ports, and 18789 |
You will need the following runtime requirements configured before real captures will work:
- A Deepgram API key in
.envasDEEPGRAM_API_KEY. - If you want OpenClaw-backed summaries, a readable host config file at
~/.openclaw/openclaw.jsoncontaininggateway.auth.token.
thirdeye is designed to run locally on your Mac. The app stores recordings, transcripts, summaries, artifacts, logs, and runtime state on your computer, not in a thirdeye-hosted cloud service.
See SECURITY.md for the local runtime model, storage locations, macOS permissions, and external provider notes.
| Path | Purpose |
|---|---|
apps/ |
Tauri shell and React UI for running thirdeye as a macOS app |
services/controller-api/ |
FastAPI backend package and Python requirements |
services/desktop-agent/ |
Docker desktop agent package and capture scripts |
services/macos-capture-agent/ |
macOS capture agent package and ScreenCaptureKit helper |
packages/capture_contracts/ |
Shared capture request and target contracts |
infra/ |
Docker Compose file and desktop image/s6 configuration |
tests/python/ |
Python test suite |
config/ |
Seed configuration files for desktop helper services |
runtime/ |
Host-mounted runtime state, recordings, artifacts, SQLite DB, and logs |
scripts/ |
Bootstrap, smoke test, export, and OpenClaw remediation scripts |
docs/ |
Architecture, API, security, operations, and troubleshooting notes |
All Python dependencies should live in the repository virtual environment at .venv. Do not install the backend dependencies into your global Python.
This command copies .env.example to .env if needed, creates runtime directories, creates .venv, installs Python dependencies into .venv, and installs the macOS app dependencies under apps/.
make setup
make doctorAfter setup, switch to virtual environment:
source .venv/bin/activateAt minimum, update these values before real use:
DEEPGRAM_API_KEY
Important environment variables from .env.example:
| Variable | Required | Purpose |
|---|---|---|
MACOS_CAPTURE_BASE_URL |
Required | Base URL for the host-local macOS capture agent |
DEEPGRAM_API_KEY |
Required | Deepgram live transcription auth |
OPENCLAW_BASE_URL |
Required | Base URL for the helper gateway |
OPENCLAW_SUMMARY_MODEL |
Required | Summary model used via OpenClaw |
RECORDING_FPS, RECORDING_WIDTH, RECORDING_HEIGHT |
Optional | Desktop recording defaults |
SILENCE_TIMEOUT_MINUTES |
Optional | Native inactivity alert timeout |
The macOS app targets build the ScreenCaptureKit helper automatically. To build the helper by itself for command-line capture agent use, run:
make macos-capture-buildThis produces services/macos-capture-agent/bin/macos_capture_helper.
The Tauri app is the main product UI and the recommended way to run thirdeye. It starts the FastAPI controller and the macOS capture agent from .venv, then talks directly to the local API at 127.0.0.1:8788.
Run the source-built macOS app:
make macos-app-devThis builds the macOS capture helper before launching the app.
Build the macOS app bundle:
make macos-app-buildThis builds the macOS capture helper before creating the app bundle and DMG.
The app bundle and DMG are created by Tauri under apps/tauri/target/release/bundle/.
Runtime data created by the app is stored under:
~/Library/Application Support/thirdeye/
Use the app's capture settings button when macOS blocks local screen, app, window, or muted app-audio capture. The packaged app bundles the ScreenCaptureKit helper inside thirdeye.app, so the normal permission entry to allow is thirdeye in Screen & System Audio Recording. The first app build uses ad-hoc signing for local use; Developer ID signing and notarization are separate distribution steps.
Once the stack is up:
- Controller API: http://127.0.0.1:8788
- Controller API docs: http://127.0.0.1:8788/docs
- On-demand isolated desktops: use the
Openbutton after creating a desktop in the app - macOS capture agent health: http://127.0.0.1:8791/health
- Optional OpenClaw health: http://127.0.0.1:18789/healthz
curl -fsS http://127.0.0.1:8788/api/health
curl -fsS http://127.0.0.1:8788/api/desktops
curl -fsS http://127.0.0.1:8791/healthIf OpenClaw is enabled:
curl -fsS http://127.0.0.1:18789/healthzAfter the app-managed services are running:
make smokeThe smoke test checks:
- controller API health
- desktop session API reachability
- optional OpenClaw health when enabled
The start form now supports two capture surfaces:
Isolated desktop: an on-demand Docker desktop created from the capture workspace.This Mac: local macOS capture through ScreenCaptureKit.
When This Mac is selected, the controller loads grouped targets from macos-capture-agent:
ScreensAppsWindows
The UI requires an explicit local target before starting the job. The job metadata records both the backend and the selected target so recovery and stop actions return to the same backend later.
On a normal day, the startup sequence is:
make macos-app-devThis is the main workflow. It builds the macOS capture helper, starts the Tauri app, and lets the app manage the local Python services from .venv.
Optional, when you are running the services manually and need This Mac capture targets:
make macos-capture-upOptional:
make up-openclawTo stop the optional OpenClaw helper:
docker compose --project-name thirdeye -f infra/compose.yaml --profile openclaw downsource .venv/bin/activate
make testThe application accepts real provider settings at runtime. Tests that need simulated services use explicit doubles under tests/support and inject them through test fixtures rather than enabling provider simulation from .env.
make test-all- Open
thirdeye.app. - Click
Start Capture. - Choose
This Macfor a local screen, app, or window, or create and choose anIsolated desktop. - Start playback yourself if needed.
- The controller starts recording and the Deepgram relay in parallel.
- Monitor the live transcript in the app.
The command-line development workflow stores state in the repo:
| Path | Contents |
|---|---|
runtime/desktop-sessions/ |
On-demand desktop session registry and per-desktop config |
runtime/recordings/ |
Shared recording output |
runtime/artifacts/jobs/<job_id>/ |
Final job artifacts |
runtime/logs/jobs/<job_id>/ |
Debug logs and raw transcript data |
runtime/controller/controller.db |
SQLite controller database |
The macOS app workflow stores controller state, recordings, artifacts, logs, and capture runtime files under ~/Library/Application Support/thirdeye/.
Final job artifacts are written to:
runtime/artifacts/jobs/<job_id>/
recording.mp4
transcript.md
summary.md
runtime/logs/jobs/<job_id>/
transcript.json
deepgram-events.jsonl
controller-events.jsonl
metadata.json may exist on disk for controller recovery and diagnostics, but it is not exposed as a downloadable artifact or shown in the frontend.
This project is licensed for personal, non-commercial use only. See LICENSE for details.
