Skip to content

kernel-module: detect nft_expr_ops.validate signature from headers#2085

Closed
andrewyager wants to merge 1 commit intosipwise:masterfrom
andrewyager:fix/nft-validate-ubuntu-kernel-compat
Closed

kernel-module: detect nft_expr_ops.validate signature from headers#2085
andrewyager wants to merge 1 commit intosipwise:masterfrom
andrewyager:fix/nft-validate-ubuntu-kernel-compat

Conversation

@andrewyager
Copy link
Copy Markdown
Contributor

Summary

The current LINUX_VERSION_CODE < KERNEL_VERSION(6,12,0) check for the nft_expr_ops.validate callback signature breaks on distribution kernels that backport the API change without updating LINUX_VERSION_CODE.

Upstream commit: eaf9b2c875ec — "netfilter: nf_tables: drop unused 3rd argument from validate callback ops" (Florian Westphal, 2024-09-03)

Affected: Ubuntu 24.04 kernel 6.8.0-106 (stable patchset 2026-01-27, LP: #2139158) backports this commit to a 6.8 kernel, causing the DKMS build to fail:

error: initialization of 'int (*)(const struct nft_ctx *, const struct nft_expr *)'
from incompatible pointer type 'int (*)(const struct nft_ctx *, const struct nft_expr *,
const struct nft_data **)' [-Werror=incompatible-pointer-types]

Fix

Replace the LINUX_VERSION_CODE check with compile-time header detection:

  1. Makefile: Inspect the installed kernel headers for the actual validate callback signature. If nft_data appears in the validate parameter list, define NFT_EXPR_OPS_VALIDATE_HAS_DATA.
  2. nft_rtpengine.c: Use #if defined(NFT_EXPR_OPS_VALIDATE_HAS_DATA) instead of the version check.

This correctly handles:

  • Mainline kernels < 6.12 (3-param) ✅
  • Mainline kernels >= 6.12 (2-param) ✅
  • Distribution kernels with backported API changes (e.g. Ubuntu 6.8.0-106) ✅

Testing

Compiled and verified against:

  • Ubuntu 24.04 6.8.0-90-generic (old 3-param API) — builds successfully
  • Ubuntu 24.04 6.8.0-106-generic (backported 2-param API) — builds successfully
  • DKMS install on Ubuntu 24.04 with kernel 6.8.0-106 — module loads correctly

@rfuchs
Copy link
Copy Markdown
Member

rfuchs commented Mar 23, 2026

Ubuntu still doesn't have their own versioning define (akin to RHEL_RELEASE_CODE) to be able to detect this?

@andrewyager
Copy link
Copy Markdown
Contributor Author

Ubuntu does have UTS_UBUNTU_RELEASE_ABI (defined in include/generated/utsrelease.h) — it's 90 on 6.8.0-90 and 106 on 6.8.0-106. However, it's not really practical for this use case:

  1. You'd need to hardcode ABI thresholds per kernel series. The backport landed somewhere between ABI 90 and 106 on the Noble 6.8 kernel, but the exact cutover is unknown without bisecting the changelogs. And the ABI number would be different for Jammy HWE (6.8.0-xx~22.04.x), Oracular, etc.

  2. The ABI number has no series context. Unlike RHEL_RELEASE_CODE which encodes major.minor, Ubuntu's ABI is just a bare incrementing integer — UTS_UBUNTU_RELEASE_ABI >= 100 means completely different things on Noble vs Jammy HWE vs a future derivative.

  3. Other distros backport too. SUSE, Debian backports, or any downstream could make the same backport. A version-based check only helps for the specific distro you code for. While this hasn't made any bug reports yet, that's not to say users aren't out there silently being confused why their dkms builds are failing.

The header grep approach sidesteps all of this — it detects the actual API the kernel headers expose, regardless of which distro or ABI version you're building against.

@rfuchs
Copy link
Copy Markdown
Member

rfuchs commented Mar 23, 2026

Thanks ChatGPT.

I find a simple grep to be a bit too fragile for this, as there's no guarantee that the pattern (*validate) won't appear elsewhere in the same file in a future version. Using a #define provided by the distro explicitly for this purpose is actually preferable.

Either that, or a more sophisticated method to parse the header file to extract the necessary information.

@andrewyager
Copy link
Copy Markdown
Contributor Author

Point taken, but no I don't use ChatGPT.

I'm going to rework this and actually do it more properly and use a compile test. UTS_UBUNTU_RELEASE_ABI feels to fragile, and while it does work for Jammy HWE and Noble in this case, it doesn't feel robust enough. your comment on (*validate) potentially appearing somewhere else is fair.

@andrewyager
Copy link
Copy Markdown
Contributor Author

OK - this fix should be better.

There is an issue that would surface in this scenario if you were upgrading from 6.8.0-90 to 6.8.0-106 where the signature changes, because the Makefile runs

KSRC ?= /lib/modules/$(shell uname -r)/build

which will always pick the current Kernel headers. KSRC in the module Makefile is never overridden — it still points to the running kernel. This means any logic in the Makefile or its helper scripts that depends on KSRC will inspect the running kernel's headers, not the target kernel's headers; but this was how it always worked.

@rfuchs
Copy link
Copy Markdown
Member

rfuchs commented Mar 23, 2026

which will always pick the current Kernel headers

Well, that's a bit of a problem then, isn't it. 😄

dkms seems to set KERNELRELEASE to the target version, so maybe use that?

Alternatively since this script is called from make, the MAKE variable should be, which may include the required information, and could be used in place of just plain make? (I haven't checked this in more detail)

Or perhaps there's a way to integrate this into the main makefile somehow... I need to spin up some VMs to play with this.

@andrewyager
Copy link
Copy Markdown
Contributor Author

andrewyager commented Mar 24, 2026

Yep - KERNELRELEASE seems like the a good choice here, and in my testing has correctly resolved around the problem releases. I've fixed this in the scope of this patch, but the issue remains in the Makefile (where I pulled the pattern from).

If this were to be patched, it would look like:

diff --git a/kernel-module/Makefile b/kernel-module/Makefile
index be0f349..XXXXXXX 100644
--- a/kernel-module/Makefile
+++ b/kernel-module/Makefile
@@ -1,4 +1,4 @@
-KSRC   ?= /lib/modules/$(shell uname -r)/build
+KSRC   ?= /lib/modules/$(or $(KERNELRELEASE),$(shell uname -r))/build
 KBUILD := $(KSRC)
 M      ?= $(CURDIR)

I can add this to this PR, or it can go as a seperate concern.

@rfuchs
Copy link
Copy Markdown
Member

rfuchs commented Mar 24, 2026

Within this PR is fine, either as a separate commit or squashed into the other ones. Either way, please squash the other commits into one to have a clean base.

@andrewyager andrewyager force-pushed the fix/nft-validate-ubuntu-kernel-compat branch from 6d09edd to dfb0059 Compare March 24, 2026 22:00
@andrewyager
Copy link
Copy Markdown
Contributor Author

Done!

@rfuchs
Copy link
Copy Markdown
Member

rfuchs commented Mar 25, 2026

Sadly this doesn't work, the compile test unconditionally fails with

warning: the compiler differs from the one used to build the kernel
  The kernel was built by: x86_64-linux-gnu-gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
  You are using:           gcc-13 (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
/tmp/tmp.FgG2u5FQyj/gen-rtpengine-kmod-flags >/tmp/tmp.FgG2u5FQyj/rtpengine-kmod.mk
/bin/sh: 1: /tmp/tmp.FgG2u5FQyj/gen-rtpengine-kmod-flags: not found
/var/lib/dkms/ngcp-rtpengine/14.2.0.0+0~mr14.2.0.0/build/Makefile:7: /tmp/tmp.FgG2u5FQyj/rtpengine-kmod.mk: No such file or directory
make[4]: *** [/var/lib/dkms/ngcp-rtpengine/14.2.0.0+0~mr14.2.0.0/build/Makefile:29: /tmp/tmp.FgG2u5FQyj/rtpengine-kmod.mk] Error 127
make[4]: *** Deleting file '/tmp/tmp.FgG2u5FQyj/rtpengine-kmod.mk'
make[3]: *** [Makefile:1936: /var/lib/dkms/ngcp-rtpengine/14.2.0.0+0~mr14.2.0.0/build] Error 2

and so the HAS_DATA flag is never set. Looks like it's still pulling in the upper level makefile for some reason.

The current LINUX_VERSION_CODE check for the nft_expr_ops.validate
callback signature breaks on distribution kernels that backport the
API change (mainline commit eaf9b2c875ec, merged in 6.12) without
updating LINUX_VERSION_CODE.

For example, Ubuntu 24.04's 6.8.0-103+ kernel (stable patchset
2026-01-27, LP: #2139158) includes this backport, causing DKMS
builds to fail with -Werror=incompatible-pointer-types.

Replace the version-based #if with a compile test in the existing
gen-rtpengine-kmod-flags configure script. The test tries to assign
a 3-param function to .validate -- if it compiles, the old API is
present and NFT_EXPR_OPS_VALIDATE_HAS_DATA is set. If it fails, the
kernel has the new 2-param version.

Also use kbuild's KERNELRELEASE variable (instead of uname -r) to
resolve the kernel build directory, so that compile tests and KSRC
target the correct kernel during cross-version DKMS builds.

Tested against Ubuntu 6.8.0-90 (3-param) and 6.8.0-106 (2-param),
including cross-kernel builds where the running kernel differs from
the DKMS target.
@andrewyager andrewyager force-pushed the fix/nft-validate-ubuntu-kernel-compat branch from dfb0059 to 39f340d Compare March 26, 2026 03:39
@andrewyager
Copy link
Copy Markdown
Contributor Author

Ahh; this was introduced by a difference in my build patch for our repo build (we are patching this with script for our builds on our mirror at https://mirror.realcompute.io/rtpengine/) but I didn't merge it properly back into this patch. Fixed now, and I've taken this and test built from it on a clean Ubuntu box which hopefully matches what you will do...

@rfuchs
Copy link
Copy Markdown
Member

rfuchs commented Mar 26, 2026

FWIW if you build packages, I suggest you use the included packaging and backports script (https://github.com/sipwise/rtpengine/tree/master/pkg/deb) which remove the ngcp- prefix from the packages among other things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants