Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
61cd79e
Bump ruff from 0.15.9 to 0.15.14
JE-Chen May 23, 2026
d123310
Bump PySide6 from 6.11.0 to 6.11.1
JE-Chen May 23, 2026
ac60881
Document AnyDesk-style Quick Connect + Phase 4/5 features
JE-Chen May 23, 2026
1746b1c
Add Phase 6 hardening: encryption, regression, semantic replay, resum…
JE-Chen May 23, 2026
f348ffe
Add AdminConsoleClient.fetch_thumbnails for cross-host dashboard
JE-Chen May 23, 2026
a2b8672
Add live thumbnail grid to admin console (Phase 6.5 GUI)
JE-Chen May 23, 2026
aac1656
Add pluggable video codec on TCP/WS path (Phase 6.8)
JE-Chen May 23, 2026
1a2b195
Add USB list RPC + autocontrol-lsp scaffold (Phase 6.9 + 6.10)
JE-Chen May 23, 2026
0bea86d
Add Phase 7 layer: Docker, FSM, tool-use, agent loop, self-healing, W…
JE-Chen May 23, 2026
123042a
Add Phase 7 ops layer: profiler v2, config sync, RBAC, TLS ACME helpers
JE-Chen May 23, 2026
88efa34
Add full ACME v2 client + USB/IP host protocol (Phase 8.1 + 8.2)
JE-Chen May 23, 2026
a1395da
Add Helm chart, action JSON CI, production agent backends (Phase 9.1-…
JE-Chen May 23, 2026
0ce906d
Add PaddleOCR, libusb URB, Android ADB, time-travel debug (Phase 9.3-…
JE-Chen May 23, 2026
0e5efcb
Add Prometheus metrics + OpenTelemetry tracer wrapper (Phase 10.1)
JE-Chen May 23, 2026
50ca5ce
Finish OCR backend refactor: tesseract / easyocr base classes + tests
JE-Chen May 23, 2026
191e986
Drain rejected MCP HTTP bodies before closing to avoid TCP RST on Win…
JE-Chen May 23, 2026
ddfcfb8
Add examples/ + docs for OCR backends and observability
JE-Chen May 23, 2026
b3a695b
Round out examples: recording, variables, windows, hotkeys, triggers,…
JE-Chen May 23, 2026
9720a5c
Address SonarCloud / Bandit / ruff findings
JE-Chen May 23, 2026
79d5099
README: surface examples, OCR backends, observability, uv.lock
JE-Chen May 23, 2026
5dff405
Merge remote-tracking branch 'origin/main' into dev
JE-Chen May 23, 2026
e57a8d6
Address SonarCloud PR #194 issues + hotspots
JE-Chen May 23, 2026
6d72a07
Close remaining Sonar + Codacy findings on PR #194
JE-Chen May 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions .github/workflows/action-json-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Reusable GitHub Actions workflow — drop this into a repo that hosts
# AutoControl action JSON files (``*.action.json`` by default) and get
# PR-level validation for free. The workflow:
# 1. Installs je_auto_control from PyPI (or a configurable ref).
# 2. Globs every action JSON file matching ``files``.
# 3. Runs ``python -m je_auto_control.utils.action_lint`` over each.
# Any ``error``-severity finding fails the workflow.

name: action-json-lint

on:
workflow_call:
inputs:
files:
description: "Glob for action JSON files to lint."
required: false
type: string
default: "**/*.action.json"
autocontrol_ref:
description: "Pip spec for je_auto_control (e.g. == 0.1.0 or git+https://...)."
required: false
type: string
default: "je_auto_control"

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-python@v5
with:
python-version: "3.12"

- name: Install je_auto_control
env:
AUTOCONTROL_REF: ${{ inputs.autocontrol_ref }}
run: |
python -m pip install --upgrade pip
python -m pip install "$AUTOCONTROL_REF"

- name: Lint action JSON files
shell: bash
env:
FILES_GLOB: ${{ inputs.files }}
run: |
shopt -s globstar nullglob
files=( $FILES_GLOB )
if [ ${#files[@]} -eq 0 ]; then
echo "No files matched $FILES_GLOB — nothing to lint."
exit 0
fi
echo "Linting ${#files[@]} files..."
python -m je_auto_control.utils.action_lint "${files[@]}"
2 changes: 1 addition & 1 deletion .github/workflows/quality.yml
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,7 @@ jobs:
# for any sub-package the snapshot doesn't include
# (admin, usb, remote_desktop, vision, …).
pip install -e .
pip install ruff==0.15.13 bandit==1.9.4 pytest==9.0.3 pytest-timeout==2.4.0 pytest-rerunfailures==15.1 PySide6==6.11.1
pip install ruff==0.15.14 bandit==1.9.4 pytest==9.0.3 pytest-timeout==2.4.0 pytest-rerunfailures==15.1 PySide6==6.11.1

- name: Run headless pytest suite
run: pytest test/unit_test/headless/ -v --tb=short --timeout=120
238 changes: 225 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- [Event Triggers](#event-triggers)
- [Run History](#run-history)
- [Report Generation](#report-generation)
- [Observability (Prometheus / OpenTelemetry)](#observability-prometheus--opentelemetry)
- [Remote Automation (Socket / REST)](#remote-automation-socket--rest)
- [Plugin Loader](#plugin-loader)
- [Shell Command Execution](#shell-command-execution)
Expand All @@ -60,7 +61,7 @@
- **Image Recognition** — locate UI elements on screen using OpenCV template matching with configurable threshold
- **Accessibility Element Finder** — query the OS accessibility tree (Windows UIA / macOS AX) to locate buttons, menus, and controls by name/role
- **AI Element Locator (VLM)** — describe a UI element in plain language and let a vision-language model (Anthropic / OpenAI) find its screen coordinates
- **OCR** — extract text from screen regions using Tesseract; wait for, click, or locate rendered text; regex search and full-region dump
- **OCR** — extract text from screen regions through three pluggable backends (Tesseract for ASCII, EasyOCR for CJK without an external binary, PaddleOCR for highest-quality Chinese / Japanese / Korean). Single unified API + canonical language codes; backend chosen by `backend=` kwarg, `AUTOCONTROL_OCR_BACKEND` env var, or auto-detection. Wait for, click, or locate rendered text; regex search and full-region dump
- **LLM Action Planner** — translate a plain-language description into a validated `AC_*` action list using Claude
- **Runtime Variables & Control Flow** — `${var}` substitution at execution time, plus `AC_set_var` / `AC_inc_var` / `AC_if_var` / `AC_for_each` / `AC_loop` / `AC_retry` for data-driven scripts
- **Remote Desktop** — stream this machine's screen and accept remote input over a token-authenticated TCP protocol, *or* connect to another machine and view + control it (host + viewer GUIs included). Optional TLS (HTTPS-grade encryption), WebSocket transport (ws:// + wss:// for browser / firewall-friendly clients), persistent 9-digit Host ID, host→viewer audio streaming, bidirectional clipboard sync (text + image), and chunked file transfer (drag-drop + progress bar; arbitrary destination path; no size cap). Plus folder sync (additive mirror — local deletions never propagate) and a self-hosted coturn TURN config bundle generator (turnserver.conf + systemd unit + docker-compose + README). **AnyDesk-style popout**: when the viewer authenticates, the live remote desktop opens in its own resizable top-level window so the control panel stays uncluttered. The Remote Desktop tabs are wrapped in `QScrollArea` so the panel stays usable on small windows and stretches edge-to-edge on 4K displays. Driveable headlessly via `je_auto_control` and over MCP through the new `ac_remote_*` tools
Expand Down Expand Up @@ -94,6 +95,7 @@
- **OpenAPI 3.1 + Swagger UI** — `GET /openapi.json` (auth-gated, generated from the live route table) + `GET /docs` (browser Swagger UI with bearer token bar). Drift test in CI catches new routes added without metadata.
- **Configuration Bundle** — single-file JSON export/import of user config (admin hosts, address book, trusted viewers, known hosts, host service, IDs). Atomic write with `<name>.bak.<timestamp>` backups; CLI `python -m je_auto_control.utils.config_bundle export|import`; `POST /config/{export,import}`; GUI buttons on the REST API tab.
- **USB Passthrough (experimental, opt-in)** — wire-level protocol over a WebRTC `usb` DataChannel (10 opcodes, CREDIT-based flow control, 16 KiB payload cap). Host-side `UsbPassthroughSession` end-to-end on the Linux libusb backend; Windows `WinUSB` backend with full ctypes wiring (hardware-unverified); macOS `IOKit` skeleton. Viewer-side blocking client (`UsbPassthroughClient` → `ClientHandle.control_transfer / bulk_transfer / interrupt_transfer`). Persistent ACL (`~/.je_auto_control/usb_acl.json`, default deny, mode 0600) with host-side prompt QDialog and tamper-evident audit-log integration. Default off — opt-in via `enable_usb_passthrough(True)` or `JE_AUTOCONTROL_USB_PASSTHROUGH=1`. Phase 2e external security review checklist included; default-on requires sign-off.
- **Observability (Prometheus + OpenTelemetry)** — stdlib-only `Counter` / `Gauge` / `Histogram` registry with a tiny built-in HTTP exporter on `/metrics`, plus an OpenTelemetry-compatible tracer that upgrades to real OTel spans when the SDK is installed. The executor and agent loop emit `autocontrol_action_calls_total{action,outcome}`, `autocontrol_action_duration_seconds`, and `autocontrol_agent_steps_total{tool,outcome}` automatically — drop the URL into a Prometheus scrape config and you have a Grafana dashboard with zero per-script wiring.

---

Expand Down Expand Up @@ -334,6 +336,14 @@ third-party components and their licenses.

## Quick Start

Looking for copy-pasteable end-to-end scripts instead of API snippets?
The [`examples/`](examples/) directory has 17 self-contained programs
covering screenshot + click, OCR, the headless scheduler, remote
desktop, the agent loop, observability, recording / replay, runtime
variables, window management, hotkeys, image triggers, HTML reports,
the MCP stdio bridge, the REST API, the secrets vault, and plugin
loading.

### Mouse Control

```python
Expand Down Expand Up @@ -463,12 +473,26 @@ ac.click_text("Submit")
ac.wait_for_text("Loading complete", timeout=15.0)
```

Backend selection — set ``AUTOCONTROL_OCR_BACKEND=tesseract|easyocr|paddleocr``
or pass ``backend=`` per call; otherwise auto-detection picks the first
one that imports:

```python
ac.find_text_matches("登入", lang="chi_tra", backend="easyocr")
ac.click_text("Sign in", backend="tesseract")
```

If Tesseract is not on `PATH`, point at it explicitly:

```python
ac.set_tesseract_cmd(r"C:\Program Files\Tesseract-OCR\tesseract.exe")
```

Backend install paths and the canonical lang-code table are in
[docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst](docs/source/Eng/doc/ocr_backends/ocr_backends_doc.rst)
(or the [繁體中文](docs/source/Zh/doc/ocr_backends/ocr_backends_doc.rst)
version).

Dump every recognised text record in a region (or full screen), or
search by regex when the text varies:

Expand Down Expand Up @@ -577,24 +601,175 @@ viewer.send_input({"action": "type", "text": "hello"})
viewer.disconnect()
```

GUI: **Remote Desktop** tab with two sub-tabs.

- **Host** — token field with a *Generate* button, security warning
about the bind address, start / stop controls, refreshing port +
viewer-count status, and a 4 fps preview pane below the controls so
the user being remoted sees what viewers see.
- **Viewer** — address / port / token form, *Connect* / *Disconnect*,
and a custom frame-display widget that paints incoming JPEG frames
scaled with `KeepAspectRatio`. Mouse / wheel / key events on the
display are remapped from widget coordinates back to the remote
screen's pixel space using the latest frame's dimensions, then
forwarded as `INPUT` messages.
GUI: **Remote Desktop** tab opens to the **Quick Connect** screen
(AnyDesk-style) by default — huge Host ID on one side, a single input
that accepts `host:port`, `ws://`, `wss://`, or a 9-digit Host ID on
the other, with *Connect* and *Start hosting* as the two primary
buttons. Recent connections are remembered across sessions. Advanced
per-transport sub-tabs (legacy TCP / WS host + viewer, WebRTC host +
viewer with manual SDP / custom codecs / TLS pinning) stay one click
away. WebRTC sub-tabs lazy-load so a stock install without the
`[webrtc]` extra still opens the tab.

> ⚠️ Anyone with the host:port and token gets full mouse / keyboard
> control of the host machine. Default bind is `127.0.0.1`; expose
> externally only via SSH tunnel or TLS front-end. The token is the
> only line of defence — treat it like a password.

**Quick Connect headless API.** The transport coordinator that backs
the GUI input box is also exported, so scripts can dispatch the same
way:

```python
from je_auto_control import parse_remote_desktop_target
parse_remote_desktop_target("192.168.1.10:5555")
# ConnectTarget(kind='tcp', host='192.168.1.10', port=5555, ...)
parse_remote_desktop_target("ws://hub:8765/desk")
# ConnectTarget(kind='ws', host='hub', port=8765, path='/desk')
parse_remote_desktop_target("123-456-789")
# ConnectTarget(kind='webrtc_id', host_id='123456789')
```

**Connection approval + view-only mode.** Optional callback gates
every incoming session AnyDesk-style. Returning `"view_only"` admits
the viewer but drops their `INPUT` messages; returning a falsy value
(or raising) sends `AUTH_FAIL` "rejected by host":

```python
from je_auto_control import RemoteDesktopHost, PendingViewer

def gate(p: PendingViewer) -> str:
if p.address[0].startswith("10."):
return "view_only"
return "full" # or True

host = RemoteDesktopHost(token="tok", on_pending_viewer=gate)
```

**IP allowlist (CIDR + exact IPs).** Reject peers outside the
configured ranges *before* TLS / auth runs, so attackers can't probe
further:

```python
host = RemoteDesktopHost(
token="tok", ip_allowlist=["10.0.0.0/8", "192.168.1.100"],
)
```

**One-time share codes** — extra tokens that self-destruct on first
successful auth, ideal for client-support workflows:

```python
host = RemoteDesktopHost(token="tok", single_use_tokens=["abc123"])
host.add_single_use_token("9k4ndx") # rotate at runtime
host.revoke_single_use_token("abc123") # cancel before it's used
```

**TOTP 2FA (RFC 6238, stdlib only).** Layer a 6-digit OTP on top of
the token; host accepts ±1 step of clock drift:

```python
from je_auto_control.utils.remote_desktop.totp import (
generate_secret, generate_code, provisioning_uri,
)
secret = generate_secret()
print(provisioning_uri(secret, account="alice")) # otpauth:// URI for QR

host = RemoteDesktopHost(token="tok", totp_secret=secret)
viewer = RemoteDesktopViewer(
host=..., token="tok", totp_code=generate_code(secret),
)
```

**Multi-monitor selection.** Capture one specific monitor instead of
the combined virtual desktop:

```python
from je_auto_control import list_host_monitors, RemoteDesktopHost
print(list_host_monitors())
# [{'index': 0, 'is_combined': True, ...},
# {'index': 1, 'left': 0, 'top': 0, ...},
# {'index': 2, 'left': 1920, ...}]
host = RemoteDesktopHost(token="tok", monitor_index=1)
```

**Remote cursor overlay.** Host broadcasts cursor position at 30 Hz
(deduped on still desktops); the viewer's popup window draws an arrow
on top of the JPEG stream so you can see exactly where the host's
pointer is. Disable via `enable_cursor_broadcast=False`.

**Multi-viewer collaborative cursors + chat.** Two new message types
(`CHAT` and `CURSOR` with `viewer_id`). Use a `MultiViewerHost` to
relay one viewer's pointer to the others; pair with the chat channel
for ad-hoc text between operators:

```python
host = RemoteDesktopHost(
token="tok", on_chat=lambda sender, text: print(sender, ":", text),
)
host.broadcast_chat("session starts in 30s")
host.broadcast_viewer_cursor("alice", 200, 300)

viewer = RemoteDesktopViewer(
host=..., on_chat=lambda s, t: ...,
on_viewer_cursor=lambda vid, x, y: ...,
)
viewer.send_chat("ack")
```

**Relative mouse mode (FPS / CAD).** New input action that sends
deltas instead of absolute coordinates:

```python
viewer.send_input({"action": "mouse_move_relative", "dx": 5, "dy": -3})
```

**Motion-aware capture.** The capture loop now hashes each encoded
JPEG; identical frames are skipped, so a static desktop produces
~zero bandwidth. New viewers are seeded with the latest frame on auth
so they never see a black popup.

**Live stats** (FPS / kbps / totals over a 3-second window):

```python
viewer.stats()
# {'fps': 24.3, 'kbps': 4801.2, 'frames': 720.0, 'bytes': 1.8e7, 'uptime': 30.2}
```

**JPEG sequence recorder (no PyAV needed).** TCP-path session
capture: each frame written to disk plus `manifest.json` so it can
be replayed at original cadence:

```python
from je_auto_control.utils.remote_desktop.jpeg_recorder import (
JpegSequenceRecorder,
)
rec = JpegSequenceRecorder("~/recordings/2026-05-23")
rec.start()
viewer = RemoteDesktopViewer(host=..., on_frame=rec.record_frame)
# ... session ...
rec.stop() # writes manifest.json next to the .jpg files
```

**TCP relay (WebRTC fallback).** When P2P fails (strict NAT, mobile
CGNAT, hotel Wi-Fi), both peers connect outbound to a relay and
exchange a shared 32-byte session ID; the relay pipes bytes between
them. Same module ships an `encode_handshake(role, session_id)`
helper for clients:

```python
from je_auto_control.utils.remote_desktop.relay import RelayServer
relay = RelayServer(bind="0.0.0.0", port=9000) # NOSONAR # public relay
relay.start()
```

**Service installer (unattended host).** `python -m
je_auto_control.utils.remote_desktop.host_service ...`
exposes `configure` / `init` / `run` plus per-platform installers:
`install-windows-service` / `uninstall-windows-service` (pywin32),
`generate-launchd` / `uninstall-launchd`, `generate-systemd` /
`uninstall-systemd`.

**Encrypted transports + alternate protocols.** Pass an `ssl_context`
to either `RemoteDesktopHost` or `RemoteDesktopViewer` to wrap every
connection in TLS. For firewall-friendly access, use the in-tree
Expand Down Expand Up @@ -935,6 +1110,36 @@ xml_string = je_auto_control.generate_xml()

Reports include: function name, parameters, timestamp, and exception info (if any) for each recorded action. HTML reports display successful actions in cyan and failed actions in red.

### Observability (Prometheus / OpenTelemetry)

Stdlib-only metric primitives plus an OpenTelemetry-compatible tracer
fallback. The executor and agent loop emit call counts and latency
histograms automatically — no per-script wiring required.

```python
import je_auto_control as ac

# Expose /metrics on http://127.0.0.1:9090 for Prometheus to scrape.
exporter = ac.default_metrics_exporter()
exporter.start()

# Add your own metric — same shapes as prometheus_client.
counter = ac.default_metric_registry().register(ac.MetricCounter(
"myapp_widgets_built_total", "widgets built",
label_names=("kind",),
))
counter.inc(labels={"kind": "blue"})

# Wrap a callable in a span — no-op until opentelemetry-api is installed.
@ac.traced("my_pipeline.process_one")
def process_one(item): ...
```

Built-in metrics are listed in
[docs/source/Eng/doc/observability/observability_doc.rst](docs/source/Eng/doc/observability/observability_doc.rst)
(or the [繁體中文](docs/source/Zh/doc/observability/observability_doc.rst)
version).

### Remote Automation (Socket / REST)

Two servers are available — a raw TCP socket and a stdlib HTTP/REST
Expand Down Expand Up @@ -1197,6 +1402,13 @@ cd AutoControl
pip install -r dev_requirements.txt
```

Reproducible installs use the committed `uv.lock`:

```bash
uv sync # install pinned versions across the whole dep tree
uv lock --upgrade # refresh after editing pyproject.toml
```

### Running Tests

```bash
Expand Down
Loading
Loading