Replies: 1 comment
-
|
consider also testing for energy consumption using turbostat AFAIK, Pyinfra takes less energy than Ansible on the target and on the control node, this also can help with metrics about optimizations |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
1. The Idea
PyInfra's CI is fine. Unit tests pass, linting runs, the world keeps spinning. But here's the thing : PyInfra's job is to do real things to real operating systems. Install packages. Manage services. Rewrite config files in
/etc. Nuke users. Restart daemons. And right now, none of that gets tested against an actual OS before it ships.We want to fix that.
The proposal: a second CI pipeline, completely separate from the existing one, that spins up real OS instances, restores them to a known-good snapshot, hammers them with destructive PyInfra operations, and tells you whether your PR broke Alpine's openrc connector or made Void Linux's xbps handler sad. No mocks. No Docker containers pretending to be a full OS. No "it worked on my Ubuntu laptop with glibc 2.35 and three PPAs". Real systems. Real package managers. Real init systems.
The OS matrix, all at their latest stable version as of March 2026:
That's 5 different init systems. That's 7 different package managers. That is the actual surface area PyInfra has to cover — and currently validates in exactly zero real environments before shipping.
One important thing: the remote instances have zero extra packages installed, and none is needed. PyInfra executes entirely on the runner side and talks to targets over SSH with nothing more than a shell on the other end.
The pipeline is opt-in. Maintainers drop a
run-vm-testslabel on a PR when connector-relevant code changes. Nobody gets their documentation fix blocked by a 20-minute VM test run.2. The Concept
The host
A single bare metal node on Scaleway Elastic Metal — 100% European infrastructure, no shared neighbours, no hypervisor tax, no cloud vendor lock-in. The Beryllium range gives us a dedicated Intel Xeon with NVMe storage and unlimited bandwidth, at a fixed monthly price. Kalvad provisions it, maintains it, and pays for it.
The host runs Incus — the Linux Containers community fork of LXD. Incus manages both system containers and full VMs through a single unified API and CLI. Most of the Linux targets run as lightweight system containers sharing the host kernel. FreeBSD and Gentoo run as proper QEMU-backed VMs because their kernels are not the host's kernel and there's no negotiating that.
The snapshot strategy
Each instance has exactly one snapshot named
golden. It represents a fully booted OS with only what's needed for testing: SSH daemon running, CI key injected, nothing else. No Python. No agents. No extra services.Before every test run, one command:
On a ZFS storage backend, this is a copy-on-write pointer swap, 2 to 5 seconds for containers. The VM instances take a little longer. Either way, it's not the bottleneck. The instance comes up in a provably clean state, every single time, with zero drift from the previous run.
After the restore, the runner polls SSH until the instance responds, then fires the PyInfra test suite targeting that IP over SSH, exactly how PyInfra works in production. No local execution tricks, no subprocess mocking. The orchestrator is the runner, the target is the instance, and the transport is plain SSH.
After the run: we do nothing. The state is irrelevant. Next run restores the snapshot again.
Building the golden images
Most Linux targets come straight from
images.linuxcontainers.org, which ships Incus-ready images for everything in the matrix except Gentoo and FreeBSD.FreeBSD uses Incus's QEMU backend with an official cloud image. SSH key injection via
cloud-initon first boot, then snapshot. Boot time is longer (~30 seconds) but the golden snapshot means test runs start from an already-booted state.Gentoo has no pre-built Incus image — we bootstrap from a stage3 tarball, compile a minimal world, and snapshot the result. The initial build takes a while (it's Gentoo, this is not surprising). Portage sync is disabled during test runs — we're not testing Gentoo's update cycle, we're testing PyInfra's portage connector against a known state.
Rolling releases (Arch, openSUSE Tumbleweed, Void, Gentoo) get their golden snapshots refreshed on a quarterly cron. A simple
--reuseflag replaces the existing snapshot after an in-place update:incus exec pyinfra-arch -- pacman -Syu --noconfirm incus snapshot create pyinfra-arch golden --reuseThe GitHub Actions integration
The Scaleway host registers as a self-hosted runner scoped to the PyInfra repo with a
vm-hostlabel. The existing workflow is completely untouched.Maintainers apply
run-vm-teststo trigger the full 10-OS matrix, orrun-vm-alpineto target a single OS when only one connector is in play. The runner is idle 95% of the time. No nightly runs. No push-to-main triggers. It fires when a human decides it should.What actually gets tested
Things that are impossible to validate without a real OS, that belong in
tests/vm/:returncode == 0, butapk info | grepordpkg -l | grep/etc/passwdand/etc/shadowdirectly/etc/, permission and ownership enforcement, symlinks, idempotency on second run--reboot, verifying PyInfra reconnects cleanly post-boot3. Why?
Kalvad is a lean engineering firm based in Dubai. We do emergency technical response, infrastructure automation, and technical due diligence for investors and regulated-sector clients. We don't sell a platform. We don't have a PyInfra plugin to promote. We use it because it's genuinely the right tool: infrastructure as Python that actually runs, not YAML that hopes.
PyInfra is load-bearing in everything we operate. Government infrastructure. Fintech environments. Production systems where a broken
apkconnector at 2am is a bad time. We know exactly what connector regressions look like in production because we've been on the receiving end. The kind that only surface because Alpine changed a flag inapk add, or because openSUSE's zypper error output format shifted slightly and the parser broke silently, or because runit on Void doesn't behave the way the tests assumed. These bugs are not caught by the existing test suite. They can't be, you cannot mock your way to discovering that FreeBSD 14.4'spkg installreturns exit code 1 on a dry run.We have a Scaleway Elastic Metal server we can dedicate to this. We chose Scaleway deliberately: European infrastructure, no US cloud vendor with a CLOUD Act problem, no shared hypervisor. The same reasons we pick Scaleway for our own client work. We do not have the Incus expertise yet (we are usually team xcp-ng). We have the Alpine, FreeBSD, and openrc operational knowledge that makes this setup less painful than it looks.
The monthly cost is real and Kalvad absorbs it entirely. It's modest compared to what PyInfra saves us in engineering time every month.
This is not a branding exercise. We are not asking for our name anywhere. We want the project to be more reliable because we depend on it being more reliable, pure self-interest, executed correctly.
If this catches one connector regression before it reaches a production Alpine server, it has already paid for itself.
We're posting this before writing a single line of code. If maintainers have opinions on the test structure, the trigger policy, which OS to prioritize, or why some part of this is misguided, that's exactly what we want to hear first.
Beta Was this translation helpful? Give feedback.
All reactions