Skip to content

feat(admin): add machine size update endpoint and admin UI#1357

Closed
evanjacobson wants to merge 15 commits intomainfrom
improvement/patch-claw-machine
Closed

feat(admin): add machine size update endpoint and admin UI#1357
evanjacobson wants to merge 15 commits intomainfrom
improvement/patch-claw-machine

Conversation

@evanjacobson
Copy link
Copy Markdown
Contributor

@evanjacobson evanjacobson commented Mar 21, 2026

Summary

  • Adds a targeted PATCH /api/platform/machine-size endpoint to update a kiloclaw instance's machineSize without re-provisioning (which would wipe envVars, secrets, etc.)
  • Adds updateMachineSize DO method, internal client method, and tRPC admin mutation through the full stack
  • Adds Machine Size editor in the admin panel's Live Worker Status card with valid Fly.io machine configurations (shared-cpu-2x minimum, 3 GB RAM minimum)
  • Memory options are dynamically computed based on cpu_kind and CPU count, respecting Fly's step size and per-preset min/max rules
  • selection persists on redeploy

Kilo team — Loom

https://www.loom.com/share/be4a9d1159e148fd879fa6b157f4f1b3

Test plan

  • Verify pnpm tsc --noEmit passes in both kiloclaw/ and root
  • Test the admin UI: open an instance detail, verify Machine Size displays correctly
  • Test editing: changing CPU Kind auto-adjusts CPU and memory options
  • Test Save: confirm mutation succeeds and toast shows "Takes effect on next restart"
  • Test Reset to Default: confirm sends null and displays "Default (shared-cpu-2x / 3 GB)"
  • Verify new size is applied after a machine restart (via guestFromSize)

Add a targeted PATCH endpoint to update a kiloclaw instance's machineSize
without re-provisioning (which would wipe envVars, secrets, etc.).
New size takes effect on the next machine restart.

The admin UI presents valid Fly.io machine configurations with dynamic
memory options based on cpu_kind and CPU count, enforcing shared-cpu-2x
as the minimum preset and 3 GB as the minimum RAM.
Instead of waiting for a restart, the "Save & Apply Now" button
immediately reconfigures the Fly machine by fetching its current
config and patching only the guest (CPU/memory) field. This
preserves the running image (version pin), env vars, mounts, and
all other config exactly as-is — equivalent to `fly machine update`.

A secondary "Save for Restart" button retains the old behavior.
@evanjacobson evanjacobson marked this pull request as ready for review March 24, 2026 16:25
Comment thread src/app/admin/components/KiloclawInstances/KiloclawInstanceDetail.tsx Outdated
Comment thread kiloclaw/src/durable-objects/kiloclaw-instance/index.ts Outdated
Comment thread kiloclaw/src/durable-objects/kiloclaw-instance/index.ts Outdated
Comment thread kiloclaw/src/durable-objects/kiloclaw-instance/index.ts Outdated
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot bot commented Mar 24, 2026

Code Review Summary

Status: 1 Issue Found | Recommendation: Address before merge

Overview

Severity Count
CRITICAL 0
WARNING 0
SUGGESTION 1

Fix these issues in Kilo Cloud

Issue Details (click to expand)

SUGGESTION

File Line Issue
packages/trpc/dist/index.d.ts 628 Generated dist output still includes unrelated changes that should be removed from the PR.
Other Observations (not in diff)

N/A

Files Reviewed (8 files)
  • kiloclaw/src/durable-objects/kiloclaw-instance/index.ts
  • kiloclaw/src/routes/platform.ts
  • kiloclaw/src/schemas/instance-config.ts
  • kiloclaw/worker-configuration.d.ts
  • packages/trpc/dist/index.d.ts - 1 issue
  • src/app/admin/components/KiloclawInstances/KiloclawInstanceDetail.tsx
  • src/lib/kiloclaw/kiloclaw-internal-client.ts
  • src/routers/admin-kiloclaw-instances-router.ts

Reviewed by gpt-5.4-20260305 · 2,377,097 tokens

1. Cap memory picker presets at 16384 MB to match MachineSizeSchema max
2. Defer persisting machineSize until after successful Fly apply so a
   failed update doesn't leave stale size in storage
3. Handle stopped/created machines in applyNow by calling startMachine
   explicitly (matching restartMachineInBackground behavior)
4. Pass updated.instance_id to waitForState so it waits for the new
   machine version, not a stale pre-update started state
Comment thread kiloclaw/src/durable-objects/kiloclaw-instance/index.ts
…troy

Re-read status from storage after Fly API calls and bail if the instance
was destroyed in the interim. Prevents persist() from recreating a
partial row with only machineSize after finalizeDestroyIfComplete()
has cleared all storage.
Comment thread kiloclaw/src/durable-objects/kiloclaw-instance/index.ts Outdated
DurableObjectStorage.get() returns undefined (not null) for missing
keys, so the strict === null check never matched after storage was
cleared by finalizeDestroyIfComplete().
…ch-claw-machine

# Conflicts:
#	kiloclaw/worker-configuration.d.ts
#	src/app/admin/components/KiloclawInstances/KiloclawInstanceDetail.tsx
#	src/routers/kiloclaw-billing-router.test.ts
Main removed the trial_end logic from subscription_data in
createSubscriptionCheckout. These tests were incorrectly kept
during merge conflict resolution.
@evanjacobson evanjacobson enabled auto-merge March 24, 2026 22:48
@evanjacobson evanjacobson self-assigned this Mar 24, 2026
identity: undefined;
generated: undefined;
}, {}, {}>;
discord_server_membership_verified_at: drizzle_orm_pg_core.PgColumn<{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you got some unrelated changes in here

Copy link
Copy Markdown
Contributor Author

@evanjacobson evanjacobson Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these never got generated previously but are now merged into main. When I update the branch they should disappear

Copy link
Copy Markdown
Contributor

@RSO RSO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are changes here that don't belong, but other than that, the change looks good.

One thought: Is this tRPC call going through the main tRPC endpoint? Otherwise, we may have to bump the maxDuration of the endpoint.

Copy link
Copy Markdown
Contributor

@pandemicsyn pandemicsyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought: Is this tRPC call going through the main tRPC endpoint? Otherwise, we may have to bump the maxDuration of the endpoint.

@RSO @evanjacobson instead of adding endpoints - which we probably have to workaround/change when go to do multiple instance sizes. What if we just make the instance size a env var. Then you can just override it in .dev.vars if you want to provision something else?

@evanjacobson
Copy link
Copy Markdown
Contributor Author

One thought: Is this tRPC call going through the main tRPC endpoint? Otherwise, we may have to bump the maxDuration of the endpoint.

@RSO @evanjacobson instead of adding endpoints - which we probably have to workaround/change when go to do multiple instance sizes. What if we just make the instance size a env var. Then you can just override it in .dev.vars if you want to provision something else?

The goal was to make this available in prod for our machines, in addition to dev @pandemicsyn

auto-merge was automatically disabled March 25, 2026 16:56

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants