Summary
When CapturePolicy.NAMED_ONLY is used, Reggie converts unnamed capturing groups (...) to non-capturing (?:...) before compiling. This renumbers the surviving (named) groups sequentially, so MatchResult.group(int) no longer uses JDK-compatible 1-based indices. Any caller that pre-computed group indices from the raw pattern string using JDK counting rules will silently extract the wrong field (or null).
Reproducer
// Pattern has two unnamed groups before the named group.
// JDK: group 1 = first unnamed, group 2 = second unnamed, group 3 = named
// Reggie NAMED_ONLY: unnamed -> (?:...), so named group becomes group 1
String pattern = "(a)(b)(?<name>c)";
ReggieOptions opts = ReggieOptions.builder()
.capturePolicy(CapturePolicy.NAMED_ONLY)
.build();
ReggieMatcher m = Reggie.compile(pattern, opts);
MatchResult r = m.match("abc");
// JDK-compatible expectation:
System.out.println(r.group(3)); // expected "c", actual: IndexOutOfBoundsException or null
System.out.println(r.group(1)); // expected "a" (unnamed), actual: "c" (first named)
The same issue manifests in practice with any grok-style pattern that contains unnamed structural groups (e.g. the ipv6 rule, which expands to ((([0-9A-Fa-f]{1,4}:){7}...)) with ~30 unnamed groups). Code that resolves named field positions by counting all capturing parens in the raw string — as java.util.regex.Pattern does — receives wrong values for every field when NAMED_ONLY is active.
Expected behaviour
MatchResult.group(int) should maintain JDK-compatible 1-based group numbering regardless of CapturePolicy. Groups that were unnamed and made non-capturing by NAMED_ONLY should occupy their original index slot and return null (consistent with JDK behaviour for a non-participating group). This makes Reggie a drop-in replacement for java.util.regex.Matcher from the perspective of numeric group access.
Alternatively, document clearly that NAMED_ONLY invalidates group(int) for numeric access, and that callers must switch to group(String name) — but this breaks the java.util.regex.Matcher drop-in contract.
Actual behaviour
With NAMED_ONLY, named groups are renumbered starting from 1 in their order of appearance in the transformed (unnamed-groups-removed) pattern. A call to group(n) using the JDK-computed index returns a different group or throws.
Workaround
Callers can pre-compute a int[] jdkToReggieIndex remapping at compile time and translate indices before calling group(int), but this forces every Reggie adapter to re-implement the group-counting logic that Reggie already performs internally.
Environment
- Reggie version:
0.3.0
- Java: 21
- Discovered while integrating Reggie into the Datadog
logs-backend grok parsing stack as a drop-in replacement for java.util.regex
Summary
When
CapturePolicy.NAMED_ONLYis used, Reggie converts unnamed capturing groups(...)to non-capturing(?:...)before compiling. This renumbers the surviving (named) groups sequentially, soMatchResult.group(int)no longer uses JDK-compatible 1-based indices. Any caller that pre-computed group indices from the raw pattern string using JDK counting rules will silently extract the wrong field (ornull).Reproducer
The same issue manifests in practice with any grok-style pattern that contains unnamed structural groups (e.g. the
ipv6rule, which expands to((([0-9A-Fa-f]{1,4}:){7}...))with ~30 unnamed groups). Code that resolves named field positions by counting all capturing parens in the raw string — asjava.util.regex.Patterndoes — receives wrong values for every field whenNAMED_ONLYis active.Expected behaviour
MatchResult.group(int)should maintain JDK-compatible 1-based group numbering regardless ofCapturePolicy. Groups that were unnamed and made non-capturing byNAMED_ONLYshould occupy their original index slot and returnnull(consistent with JDK behaviour for a non-participating group). This makes Reggie a drop-in replacement forjava.util.regex.Matcherfrom the perspective of numeric group access.Alternatively, document clearly that
NAMED_ONLYinvalidatesgroup(int)for numeric access, and that callers must switch togroup(String name)— but this breaks thejava.util.regex.Matcherdrop-in contract.Actual behaviour
With
NAMED_ONLY, named groups are renumbered starting from 1 in their order of appearance in the transformed (unnamed-groups-removed) pattern. A call togroup(n)using the JDK-computed index returns a different group or throws.Workaround
Callers can pre-compute a
int[] jdkToReggieIndexremapping at compile time and translate indices before callinggroup(int), but this forces every Reggie adapter to re-implement the group-counting logic that Reggie already performs internally.Environment
0.3.0logs-backendgrok parsing stack as a drop-in replacement forjava.util.regex