VM-based test infrastructure for PyInfra #1618

wowi42 · 2026-03-19T07:27:53Z

wowi42
Mar 19, 2026
Collaborator

1. The Idea

PyInfra's CI is fine. Unit tests pass, linting runs, the world keeps spinning. But here's the thing : PyInfra's job is to do real things to real operating systems. Install packages. Manage services. Rewrite config files in /etc. Nuke users. Restart daemons. And right now, none of that gets tested against an actual OS before it ships.

We want to fix that.

The proposal: a second CI pipeline, completely separate from the existing one, that spins up real OS instances, restores them to a known-good snapshot, hammers them with destructive PyInfra operations, and tells you whether your PR broke Alpine's openrc connector or made Void Linux's xbps handler sad. No mocks. No Docker containers pretending to be a full OS. No "it worked on my Ubuntu laptop with glibc 2.35 and three PPAs". Real systems. Real package managers. Real init systems.

The OS matrix, all at their latest stable version as of March 2026:

OS	Version	Init system	Package manager
Ubuntu	24.04.4 LTS "Noble Numbat"	systemd	apt
Debian	13 "Trixie"	systemd	apt
Alpine	3.23.3	openrc	apk
Arch Linux	rolling (2026-03)	systemd	pacman
Fedora	43	systemd	dnf
CentOS Stream	10	systemd	dnf
openSUSE Tumbleweed	rolling (2026-03)	systemd	zypper
Void Linux	latest (2026-03)	runit	xbps
Gentoo	rolling	openrc	portage
FreeBSD	14.4	rc	pkg

That's 5 different init systems. That's 7 different package managers. That is the actual surface area PyInfra has to cover — and currently validates in exactly zero real environments before shipping.

One important thing: the remote instances have zero extra packages installed, and none is needed. PyInfra executes entirely on the runner side and talks to targets over SSH with nothing more than a shell on the other end.

The pipeline is opt-in. Maintainers drop a run-vm-tests label on a PR when connector-relevant code changes. Nobody gets their documentation fix blocked by a 20-minute VM test run.

2. The Concept

The host

A single bare metal node on Scaleway Elastic Metal — 100% European infrastructure, no shared neighbours, no hypervisor tax, no cloud vendor lock-in. The Beryllium range gives us a dedicated Intel Xeon with NVMe storage and unlimited bandwidth, at a fixed monthly price. Kalvad provisions it, maintains it, and pays for it.

The host runs Incus — the Linux Containers community fork of LXD. Incus manages both system containers and full VMs through a single unified API and CLI. Most of the Linux targets run as lightweight system containers sharing the host kernel. FreeBSD and Gentoo run as proper QEMU-backed VMs because their kernels are not the host's kernel and there's no negotiating that.

┌───────────────────────────────────────────────────────────┐
│  Scaleway Elastic Metal (Beryllium, Paris DC)             │
│  NVMe storage, ZFS pool, unlimited bandwidth              │
│                                                           │
│  incusd                                                   │
│  ├── [container] pyinfra-ubuntu     snapshot: golden      │
│  ├── [container] pyinfra-debian     snapshot: golden      │
│  ├── [container] pyinfra-alpine     snapshot: golden      │
│  ├── [container] pyinfra-arch       snapshot: golden      │
│  ├── [container] pyinfra-fedora     snapshot: golden      │
│  ├── [container] pyinfra-centos     snapshot: golden      │
│  ├── [container] pyinfra-opensuse   snapshot: golden      │
│  ├── [container] pyinfra-void       snapshot: golden      │
│  ├── [VM]        pyinfra-gentoo     snapshot: golden      │
│  └── [VM]        pyinfra-freebsd    snapshot: golden      │
│                                                           │
│  GitHub Actions self-hosted runner  [label: vm-host]      │
└───────────────────────────────────────────────────────────┘

The snapshot strategy

Each instance has exactly one snapshot named golden. It represents a fully booted OS with only what's needed for testing: SSH daemon running, CI key injected, nothing else. No Python. No agents. No extra services.

Before every test run, one command:

incus snapshot restore pyinfra-alpine golden

On a ZFS storage backend, this is a copy-on-write pointer swap, 2 to 5 seconds for containers. The VM instances take a little longer. Either way, it's not the bottleneck. The instance comes up in a provably clean state, every single time, with zero drift from the previous run.

After the restore, the runner polls SSH until the instance responds, then fires the PyInfra test suite targeting that IP over SSH, exactly how PyInfra works in production. No local execution tricks, no subprocess mocking. The orchestrator is the runner, the target is the instance, and the transport is plain SSH.

After the run: we do nothing. The state is irrelevant. Next run restores the snapshot again.

Building the golden images

Most Linux targets come straight from images.linuxcontainers.org, which ships Incus-ready images for everything in the matrix except Gentoo and FreeBSD.

# Pull the base image
incus image copy images:alpine/3.23 local: --alias pyinfra-alpine-base

# Launch
incus launch pyinfra-alpine-base pyinfra-alpine

# SSH only — no Python, no agents, no nonsense
incus exec pyinfra-alpine -- sh -c "
  apk add --no-cache openssh &&
  rc-update add sshd &&
  rc-service sshd start
"

# Inject the CI key
incus file push ~/.ssh/ci_key.pub pyinfra-alpine/root/.ssh/authorized_keys

# Freeze
incus snapshot create pyinfra-alpine golden

FreeBSD uses Incus's QEMU backend with an official cloud image. SSH key injection via cloud-init on first boot, then snapshot. Boot time is longer (~30 seconds) but the golden snapshot means test runs start from an already-booted state.

Gentoo has no pre-built Incus image — we bootstrap from a stage3 tarball, compile a minimal world, and snapshot the result. The initial build takes a while (it's Gentoo, this is not surprising). Portage sync is disabled during test runs — we're not testing Gentoo's update cycle, we're testing PyInfra's portage connector against a known state.

Rolling releases (Arch, openSUSE Tumbleweed, Void, Gentoo) get their golden snapshots refreshed on a quarterly cron. A simple --reuse flag replaces the existing snapshot after an in-place update:

incus exec pyinfra-arch -- pacman -Syu --noconfirm
incus snapshot create pyinfra-arch golden --reuse

The GitHub Actions integration

The Scaleway host registers as a self-hosted runner scoped to the PyInfra repo with a vm-host label. The existing workflow is completely untouched.

name: VM Test Suite

on:
  pull_request:
    types: [labeled]

jobs:
  vm-tests:
    if: |
      github.event.label.name == 'run-vm-tests' ||
      startsWith(github.event.label.name, 'run-vm-')
    runs-on: [self-hosted, vm-host]

    strategy:
      fail-fast: false  # a Gentoo failure should never cancel the Alpine run
      matrix:
        os:
          - { name: ubuntu,   mode: container }
          - { name: debian,   mode: container }
          - { name: alpine,   mode: container }
          - { name: arch,     mode: container }
          - { name: fedora,   mode: container }
          - { name: centos,   mode: container }
          - { name: opensuse, mode: container }
          - { name: void,     mode: container }
          - { name: gentoo,   mode: vm }
          - { name: freebsd,  mode: vm }

    steps:
      - uses: actions/checkout@v4

      - name: Restore golden snapshot
        run: incus snapshot restore pyinfra-${{ matrix.os.name }} golden

      - name: Wait for SSH
        run: |
          IP=$(incus info pyinfra-${{ matrix.os.name }} \
            | awk '/inet\b/{print $2}' | head -1 | cut -d/ -f1)
          echo "INSTANCE_IP=$IP" >> $GITHUB_ENV
          timeout 60 bash -c \
            "until ssh -o StrictHostKeyChecking=no \
              -i ~/.ssh/ci_key root@$IP true 2>/dev/null; \
              do sleep 1; done"

      - name: Run PyInfra VM test suite
        run: |
          pytest tests/vm/ \
            --ssh-host=${{ env.INSTANCE_IP }} \
            --ssh-key=~/.ssh/ci_key \
            --os=${{ matrix.os.name }} \
            -v --tb=short

      - name: Upload logs
        if: always()
        uses: actions/upload-artifact@v4
        with:
          name: vm-logs-${{ matrix.os.name }}
          path: logs/

Maintainers apply run-vm-tests to trigger the full 10-OS matrix, or run-vm-alpine to target a single OS when only one connector is in play. The runner is idle 95% of the time. No nightly runs. No push-to-main triggers. It fires when a human decides it should.

What actually gets tested

Things that are impossible to validate without a real OS, that belong in tests/vm/:

Package management — install, remove, hold/unhold, verify actual package manager state post-op — not mocked subprocess output, not returncode == 0, but apk info | grep or dpkg -l | grep
Service management — start/stop/restart/enable/disable across systemd, openrc, runit, and FreeBSD rc; verify unit state via the native init query tool for each
User and group management — create, delete, modify shell, lock account; verify /etc/passwd and /etc/shadow directly
File operations — templated writes to /etc/, permission and ownership enforcement, symlinks, idempotency on second run
Cron — writing crontabs, verifying the cron daemon registers them
Reboot handling — operations requiring --reboot, verifying PyInfra reconnects cleanly post-boot

3. Why?

Kalvad is a lean engineering firm based in Dubai. We do emergency technical response, infrastructure automation, and technical due diligence for investors and regulated-sector clients. We don't sell a platform. We don't have a PyInfra plugin to promote. We use it because it's genuinely the right tool: infrastructure as Python that actually runs, not YAML that hopes.

PyInfra is load-bearing in everything we operate. Government infrastructure. Fintech environments. Production systems where a broken apk connector at 2am is a bad time. We know exactly what connector regressions look like in production because we've been on the receiving end. The kind that only surface because Alpine changed a flag in apk add, or because openSUSE's zypper error output format shifted slightly and the parser broke silently, or because runit on Void doesn't behave the way the tests assumed. These bugs are not caught by the existing test suite. They can't be, you cannot mock your way to discovering that FreeBSD 14.4's pkg install returns exit code 1 on a dry run.

We have a Scaleway Elastic Metal server we can dedicate to this. We chose Scaleway deliberately: European infrastructure, no US cloud vendor with a CLOUD Act problem, no shared hypervisor. The same reasons we pick Scaleway for our own client work. We do not have the Incus expertise yet (we are usually team xcp-ng). We have the Alpine, FreeBSD, and openrc operational knowledge that makes this setup less painful than it looks.

The monthly cost is real and Kalvad absorbs it entirely. It's modest compared to what PyInfra saves us in engineering time every month.

This is not a branding exercise. We are not asking for our name anywhere. We want the project to be more reliable because we depend on it being more reliable, pure self-interest, executed correctly.

If this catches one connector regression before it reaches a production Alpine server, it has already paid for itself.

We're posting this before writing a single line of code. If maintainers have opinions on the test structure, the trigger policy, which OS to prioritize, or why some part of this is misguided, that's exactly what we want to hear first.

Dexmachi · 2026-03-19T12:04:12Z

Dexmachi
Mar 19, 2026

consider also testing for energy consumption using turbostat

AFAIK, Pyinfra takes less energy than Ansible on the target and on the control node, this also can help with metrics about optimizations

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

VM-based test infrastructure for PyInfra #1618

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

VM-based test infrastructure for PyInfra #1618

Uh oh!

wowi42 Mar 19, 2026 Collaborator

1. The Idea

2. The Concept

The host

The snapshot strategy

Building the golden images

The GitHub Actions integration

What actually gets tested

3. Why?

Replies: 1 comment

Uh oh!

Dexmachi Mar 19, 2026

wowi42
Mar 19, 2026
Collaborator

Dexmachi
Mar 19, 2026