Skip to content

WIP: Multiplicity — per-runtime isolation for concurrent Perl interpreters#480

Merged
fglock merged 38 commits intomasterfrom
feature/multiplicity
Apr 11, 2026
Merged

WIP: Multiplicity — per-runtime isolation for concurrent Perl interpreters#480
fglock merged 38 commits intomasterfrom
feature/multiplicity

Conversation

@fglock
Copy link
Copy Markdown
Owner

@fglock fglock commented Apr 10, 2026

Status: WIP — Implementation complete, pending optimization

The implementation is fully functional and passes all tests:

  • make passes (all unit tests green)
  • 122/126 unit tests pass with 126 concurrent interpreters (only tie_*.t remain — pre-existing DESTROY TODO)
  • Per-runtime isolation for: global variables, I/O handles, regex state, caller stack, dynamic scope, CWD, $$, method caches, special blocks

Pending: Performance optimization before merge. Benchmarks show 5-7% general slowdown from ThreadLocal routing (acceptable), but closure (-34%) and method dispatch (-27%) have larger regressions that should be addressed first. See the optimization plan in dev/design/concurrency.md.


Summary

Implements multiplicity for PerlOnJava — a PerlRuntime class with ThreadLocal-based isolation so multiple independent Perl interpreters can coexist within the same JVM process. This follows the JRuby-inspired design in dev/design/concurrency.md (Phases 0-5).

What changed

  • PerlRuntime.java — ThreadLocal-based runtime context holder. Each thread gets its own independent Perl interpreter state via PerlRuntime.current().
  • All mutable runtime state migrated from static fields into PerlRuntime instance fields:
    • CallerStackcaller() function data
    • DynamicVariableManager — Perl local variable stack
    • RuntimeScalar.dynamicStateStack — dynamic state save/restore
    • SpecialBlock — END/INIT/CHECK blocks
    • RuntimeIO — stdout/stderr/stdin, selected handle, last-accessed handles
    • InheritanceResolver — MRO caches, method cache, overload cache, ISA tracking
    • GlobalVariable — all 17 symbol table maps (variables, arrays, hashes, code refs, IO refs, formats, aliases, etc.)
    • RuntimeRegex — 14 regex state fields ($1, $&, match positions, etc.)
    • RuntimeCode — eval caches, method handle cache, anonymous/interpreted subs
    • 16 local save/restore stacks — GlobalRuntimeScalar, GlobalRuntimeArray, GlobalRuntimeHash, RuntimeGlob, etc.
  • Per-runtime CWDchdir() updates PerlRuntime.current().cwd instead of JVM-global System.setProperty("user.dir")
  • Per-runtime $$ — unique PID per interpreter via AtomicLong counter
  • Pipe thread binding — background stderr/stdout consumer threads inherit parent PerlRuntime
  • Compilation thread safetyReentrantLock COMPILE_LOCK serializes parsing/emitting

Key design decisions

  • Zero API change for callers — Original static method signatures preserved. Methods delegate to PerlRuntime.current() internally.
  • Public accessor methods for cross-package access (e.g., GlobalVariable.getGlobalVariablesMap())
  • Safety netPerlLanguageProvider.ensureRuntimeInitialized() prevents "no runtime bound" errors

Benchmark comparison (master vs branch)

Benchmark Change Notes
lexical -4.9% Within ThreadLocal overhead budget
global -5.4% Within ThreadLocal overhead budget
eval_string -4.8% Within ThreadLocal overhead budget
regex -7.0% Within ThreadLocal overhead budget
string +6.5% Improved
closure -34.1% Needs optimization (14-17 ThreadLocal lookups/call)
method -26.9% Needs optimization (12-14 lookups on cache miss)
memory unchanged

See dev/design/concurrency.md for the three-tier optimization plan.

Test plan

  • make passes (all unit tests)
  • 126-interpreter stress test: 122/126 pass
  • directory.t + glob.t pass concurrently (CWD isolation)
  • io_read.t + io_seek.t + io_pipe.t pass concurrently (PID/pipe isolation)
  • Benchmarks run and compared against master
  • Optimize closure/method ThreadLocal hotspots (Tier 1-2)
  • Re-benchmark after optimization

Generated with Devin

fglock and others added 24 commits April 10, 2026 11:49
… 1-4)

Introduce PerlRuntime class with ThreadLocal-based per-thread state,
enabling multiple independent Perl interpreters in the same JVM.

Migrated state into PerlRuntime:
- CallerStack (caller info stack)
- DynamicVariableManager (local() variable stack)
- RuntimeScalar.dynamicStateStack (dynamic state)
- SpecialBlock (END/INIT/CHECK blocks)
- RuntimeIO (stdout/stderr/stdin, selected handle, last-accessed handles)

Key changes:
- PerlRuntime.java: ThreadLocal holder with initialize()/current() API
- Main.java: calls PerlRuntime.initialize() at startup
- PerlLanguageProvider: ensureRuntimeInitialized() safety net
- EmitOperator: uses RuntimeIO setter instead of PUTSTATIC
- All RuntimeIO consumers updated to use getter/setter accessors
- Test setUp methods initialize PerlRuntime before each test

Design: dev/design/concurrency.md

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… phase 5a)

Move all 7 mutable static fields from InheritanceResolver into PerlRuntime:
- linearizedClassesCache, packageMRO, methodCache
- overloadContextCache, isaStateCache
- autoloadEnabled, currentMRO

InheritanceResolver methods now delegate to PerlRuntime.current() internally.
External callers (DFS, C3, SubroutineParser, OperatorParser, StatementParser,
Attributes) updated to use getter/setter accessors instead of direct field access.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ty phase 5b)

Move all 17 mutable static fields from GlobalVariable into PerlRuntime:
- Symbol tables: globalVariables, globalArrays, globalHashes, globalCodeRefs
- IO/Format: globalIORefs, globalFormatRefs
- Aliasing: stashAliases, globAliases, globalGlobs
- Caches: packageExistsCache, pinnedCodeRefs, isSubs
- Classloader: globalClassLoader
- Declared tracking: declaredGlobalVariables/Arrays/Hashes

GlobalVariable methods now delegate to PerlRuntime.current() internally.
Static accessor methods (getGlobalVariablesMap(), etc.) added for external
code. 20 consumer files updated to use accessors instead of direct field
access.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Phases 1-5 completed: PerlRuntime with ThreadLocal isolation,
CallerStack, DynamicScope, SpecialBlocks, RuntimeIO, InheritanceResolver,
and GlobalVariable symbol tables all migrated to per-runtime instance fields.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…partial)

Move all 14 mutable regex static fields from RuntimeRegex into PerlRuntime:
- globalMatcher, globalMatchString
- lastMatchedString, lastMatchStart, lastMatchEnd
- lastSuccessfulMatch* (4 fields)
- lastSuccessfulPattern, lastMatchUsedPFlag, lastMatchUsedBackslashK
- lastCaptureGroups, lastMatchWasByteString

RuntimeRegex now provides static getter/setter methods that delegate to
PerlRuntime.current(). Internal methods use PerlRuntime.current() directly.

Updated consumers:
- RegexState.java (save/restore uses getter/setter methods)
- ScalarSpecialVariable.java (reads $1, $&, etc.)
- HashSpecialVariable.java (reads %+, %-)

Regex pattern caches (regexCache, optimizedRegexCache) remain global
since compiled patterns are immutable and shareable.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…plicity phase 5d)

Move evalBeginIds, evalCache, methodHandleCache, anonSubs, interpretedSubs,
evalContext, evalDepth, and inline method cache arrays from RuntimeCode static
fields to PerlRuntime instance fields with ThreadLocal-based access.

Key changes:
- Add static getter methods on RuntimeCode (getEvalBeginIds(), getEvalCache(),
  getAnonSubs(), getInterpretedSubs(), getEvalContext())
- Add incrementEvalDepth()/decrementEvalDepth()/getEvalDepth() static methods
- Change EmitterMethodCreator bytecode from GETSTATIC/PUTSTATIC to INVOKESTATIC
  for evalDepth access
- Change EmitSubroutine bytecode from GETSTATIC to INVOKESTATIC for interpretedSubs
- Update 13 consumer files across frontend, backend, and runtime packages
- evalCache/methodHandleCache are per-runtime (simpler, no sharing needed)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Demonstrates PerlOnJava's multiplicity feature with:
- MultiplicityDemo.java: Creates N threads, each with its own PerlRuntime
- CountDownLatch synchronization (replaces deadlock-prone CyclicBarrier)
- Per-thread STDOUT capture showing isolated output
- 3 sample scripts proving independent $_, $shared_test, regex state, @inc
- Shell script for easy invocation via the fat JAR

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
The $(cat <<'EOF') heredoc pattern fails when commit messages contain
single quotes (common in Perl context: $_, don't, etc.). Document the
git commit -F /tmp/commit_msg.txt workaround with tool-agnostic
placeholders referencing AI_POLICY.md.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Audit of parser (11 fields), emitter (8 fields), and class loader
identified all shared mutable static state that makes concurrent
eval "string" unsafe. Plan: AtomicInteger for counters, global
compile lock for EvalStringHandler, new-instance for controlFlowDetector.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Make concurrent eval "string" safe when multiple PerlRuntime instances
compile Perl code simultaneously:

- Add COMPILE_LOCK (ReentrantLock) to PerlLanguageProvider, acquired in
  compilePerlCode() and both EvalStringHandler.evalString() overloads.
  Serializes all parsing/emitting; execution runs outside the lock.
- Replace 4 non-atomic static counters with AtomicInteger:
  EmitterMethodCreator.classCounter, BytecodeCompiler.nextCallsiteId,
  EmitRegex.nextCallsiteId, Dereference.nextMethodCallsiteId
- Fix LargeBlockRefactorer: replace shared static controlFlowDetector
  singleton with new instance per call (avoids reset/scan race)
- Mark EmitterMethodCreator.skipVariables as final (never mutated)

The lock is reentrant so nested evals (eval inside eval) work without
deadlock. Future work will migrate parser/emitter static state to
per-PerlRuntime instances, eliminating the lock.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
RuntimeCode has two additional compilation paths that bypass
EvalStringHandler and were missing the COMPILE_LOCK:

- evalStringHelper(): JVM compilation path (Lexer -> Parser ->
  EmitterMethodCreator.createClassWithMethod). Lock covers entire
  method since it only returns a Class<?>, no execution.

- evalStringWithInterpreter(): Interpreter compilation path
  (Lexer -> Parser -> BytecodeCompiler). Lock covers parsing and
  compilation, released before execution. Uses isHeldByCurrentThread()
  in finally block to handle both success path (lock released before
  execution) and error path (lock still held at catch/return).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…gWithInterpreter

The isHeldByCurrentThread() check in the finally block over-decrements
the ReentrantLock hold count when evalStringWithInterpreter is called
in a nested scenario (e.g., BEGIN block triggers inner eval while
outer compilation holds the lock). Replace with a boolean flag that
tracks whether the success-path unlock already happened.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ign doc

Documents the ReentrantLock reentrancy behavior for nested compilation
(eval → BEGIN → require), the isHeldByCurrentThread() over-decrement
bug, and why releasing the lock during BEGIN execution is unsafe due
to shared parser state.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Three architectural fixes for multiplicity correctness:

1. Move globalInitialized from shared static boolean to per-PerlRuntime
   field. Previously thread 1 set it to true, causing threads 2-N to
   skip initializeGlobals() entirely (no $_, @inc, built-in modules).

2. Wrap executePerlCode() compilation phase in COMPILE_LOCK. The lock
   covers tokenize/parse/compile, then releases before execution so
   compiled code runs concurrently.

3. Simplify MultiplicityDemo to use executePerlCode() instead of
   compilePerlCode() + apply(). Removes redundant demo-level lock.
   INIT/CHECK/UNITCHECK blocks now execute correctly (fixes
   begincheck.t failures in 10-interpreter demo).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
… to per-runtime

Root cause: All dynamic state stacks used by Perl's `local` operator
(save/restore mechanism) were shared static fields. When multiple
interpreters ran concurrently, they pushed/popped from the same stacks,
causing cross-runtime contamination — each interpreter would restore
another interpreter's saved state.

Migrated stacks (all now per-PerlRuntime instance fields):
- GlobalRuntimeScalar.localizedStack (caused scalar local failures)
- GlobalRuntimeArray.localizedStack
- GlobalRuntimeHash.localizedStack
- RuntimeArray.dynamicStateStack
- RuntimeHash.dynamicStateStack
- RuntimeStash.dynamicStateStack
- RuntimeGlob.globSlotStack
- RuntimeHashProxyEntry.dynamicStateStack
- RuntimeArrayProxyEntry.dynamicStateStackInt + dynamicStateStack
- ScalarSpecialVariable.inputLineStateStack
- OutputAutoFlushVariable.stateStack
- OutputRecordSeparator.orsStack
- OutputFieldSeparator.ofsStack
- ErrnoVariable.errnoStack + messageStack

Each class now uses a static accessor method that delegates to
PerlRuntime.current().<field>, following the same pattern already
established for RuntimeScalar.dynamicStateStack.

Fixes: local.t (74/74), chomp.t, defer.t, local_glob_dynamic.t
now pass with concurrent interpreters.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…g issues

- Document the local save/restore stack fix (16 stacks migrated to per-runtime)
- Update multiplicity demo results: 118/126 tests pass with 126 interpreters
- Add Next Steps for per-runtime CWD and file position isolation
- Categorize remaining 8 failures: DESTROY TODO, shared CWD, shared temp files

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add `cwd` field to PerlRuntime so each interpreter has its own
current working directory. Previously chdir() called
System.setProperty("user.dir"), which is JVM-global and caused
directory.t and glob.t to fail under concurrent interpreters.

Changes:
- PerlRuntime: add `cwd` field (initialized from user.dir) and
  static getCwd() accessor with fallback
- Directory.chdir(): update PerlRuntime.current().cwd instead of
  System.setProperty("user.dir")
- RuntimeIO.resolvePath(): resolve relative paths against
  PerlRuntime.getCwd() instead of user.dir
- Updated all 21 remaining System.getProperty("user.dir") call
  sites across SystemOperator, FileSpec, POSIX, Internals,
  IPCOpen3, XMLParserExpat, ScalarGlobOperator, DirectoryIO,
  PipeInputChannel, PipeOutputChannel
- ArgumentParser kept as-is (sets initial user.dir before runtime
  creation for -C flag)

Stress test: directory.t and glob.t now pass with 126 concurrent
interpreters (were failing before).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ding

Two fixes for concurrent interpreter isolation:

1. Per-runtime unique PID ($$ variable):
   Previously all interpreters shared the same JVM PID via
   ProcessHandle.current().pid(), causing temp file collisions when
   tests use $$ in filenames (io_read.t, io_seek.t, io_layers.t).
   Now PerlRuntime assigns each instance a unique PID from an
   AtomicLong counter starting at the real JVM PID.

2. Pipe background thread runtime binding:
   PipeInputChannel and PipeOutputChannel spawn daemon threads for
   stderr/stdout consumption, but those threads had no PerlRuntime
   bound, so GlobalVariable lookups for STDOUT/STDERR failed with
   IllegalStateException and fell back to System.out/System.err.
   Now the parent PerlRuntime is captured and bound to the child
   thread via PerlRuntime.setCurrent().

Stress test: 122/126 pass with 126 concurrent interpreters
(up from 117). Only tie_*.t remain (pre-existing DESTROY TODO).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…ixes

Document per-runtime CWD isolation (commit c30eeb4) and per-runtime
PID + pipe thread binding (commit 0179c88). Update stress test
results to 122/126 with only tie_*.t (DESTROY) remaining.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Move all multiplicity demo files into a dedicated subdirectory for
better visibility. Update all cross-references bidirectionally:

- concurrency.md links to dev/sandbox/multiplicity/
- run_multiplicity_demo.sh and MultiplicityDemo.java link back to
  dev/design/concurrency.md
- dev/design/README.md updated to reference concurrency.md as the
  primary doc (supersedes multiplicity.md, fork.md, threads.md)
- Added dev/sandbox/multiplicity/README.md with quick start guide
  and link to design document

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Benchmark comparison (master vs feature/multiplicity):
- Most benchmarks show 5-7% slowdown from ThreadLocal routing
- Closure: -34% (14-17 ThreadLocal lookups per call from WarningBits/HintHash)
- Method: -27% (12-14 PerlRuntime.current() lookups on cache miss)
- Memory: unchanged

Three-tier optimization plan:
1. Cache PerlRuntime.current() in local variables (low risk, mechanical)
2. Consolidate WarningBits/HintHash stacks into PerlRuntime (medium risk)
3. Warm inline method cache for multiplicity use case (low risk)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@fglock fglock changed the title feat: PerlRuntime multiplicity — ThreadLocal-based runtime isolation WIP: Multiplicity — per-runtime isolation for concurrent Perl interpreters Apr 10, 2026
fglock and others added 3 commits April 10, 2026 16:29
Add concrete guidance for each optimization tier:
- Goal: reduce closure/method regressions to under 10%
- Step-by-step methodology: commit, make, benchmark 3x median, compare
- Revert criteria: revert if no measurable gain AND no architectural benefit
- Tier 1: before/after code pattern, exact files, expected impact per step
- Tier 2: table of all 8 ThreadLocals to migrate, concrete steps, revert threshold (15%)
- Tier 3: diagnostic-first approach (measure cache hit rate before coding)
- Gate between tiers: benchmark before proceeding to next tier

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
- Branch workflow: work on feature/multiplicity-opt, merge back on
  success, document and delete on failure
- Added 'Failed Optimization Attempts' section for recording what
  was tried and why it was reverted

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fglock added a commit that referenced this pull request Apr 10, 2026
…weaken

Link to PR #480 (Multiplicity) and PR #464 (DESTROY/weaken) from the
v5.42.3 Work in Progress section with terse status summaries.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fglock added a commit that referenced this pull request Apr 10, 2026
…weaken (#482)

Link to PR #480 (Multiplicity) and PR #464 (DESTROY/weaken) from the
v5.42.3 Work in Progress section with terse status summaries.

Generated with [Devin](https://cli.devin.ai/docs)

Co-authored-by: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
fglock and others added 11 commits April 10, 2026 19:11
Cache the ThreadLocal lookup result at method entry instead of calling
PerlRuntime.current() multiple times per method. This eliminates
redundant ThreadLocal lookups in hot paths:

- GlobalVariable.getGlobalCodeRef(): 4 lookups → 1
- GlobalVariable.getGlobalVariable/Array/Hash(): 2 lookups → 1
- GlobalVariable.definedGlob(): 7 lookups → 1
- GlobalVariable.isPackageLoaded(): 3 lookups → 1
- InheritanceResolver.findMethodInHierarchy(): ~8 lookups → 1
- InheritanceResolver.linearizeHierarchy(): ~5 lookups → 1
- InheritanceResolver.invalidateCache(): 4 lookups → 1

Also optimized several other GlobalVariable accessors:
defineGlobalCodeRef, replacePinnedCodeRef, aliasGlobalVariable,
setGlobAlias, getGlobalIO, getGlobalFormatRef, definedGlobalFormatAsScalar,
resetGlobalVariables, resolveStashAlias.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Consolidate 11 separate ThreadLocal stacks from WarningBitsRegistry,
HintHashRegistry, and RuntimeCode into PerlRuntime instance fields.
This reduces ThreadLocal lookups per subroutine call from ~14-17
(one per ThreadLocal.get()) to 1 (PerlRuntime.current(), then direct
field access).

Migrated ThreadLocals:
- WarningBitsRegistry: currentBitsStack, callSiteBits, callerBitsStack,
  callSiteHints, callerHintsStack, callSiteHintHash, callerHintHashStack
- HintHashRegistry: callSiteSnapshotId, callerSnapshotIdStack
- RuntimeCode: evalRuntimeContext, argsStack

The shared static ConcurrentHashMaps (WarningBitsRegistry.registry,
HintHashRegistry.snapshotRegistry) remain static as they are shared
across runtimes and only written at compile time.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Add pushCallerState/popCallerState and pushSubState/popSubState
batch methods to PerlRuntime, replacing 8-12 separate
PerlRuntime.current() calls per subroutine call with just 2.

Closure: 569 -> 601 ops/s (+5.6%)
Method: 319 -> 336 ops/s (+5.3%)
Lexical: 375K -> 458K ops/s (+22.2%)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
RegexState.save() and restore() each called 13 individual RuntimeRegex
static accessors, each doing its own PerlRuntime.current() ThreadLocal
lookup. Replaced with a single PerlRuntime.current() call and direct
field access in both constructor and dynamicRestoreState().

Eliminates 24 ThreadLocal lookups per subroutine call.

JFR profiling showed RegexState was the dominant ThreadLocal overhead
source (126 of 143 PerlRuntime.current() samples in closure benchmark).

Closure: 601 -> 814 ops/s (+35%, now -5.7% vs master)
Method: 336 -> 399 ops/s (+19%, now -8.5% vs master)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…cy.md

Document the profiling findings (RegexState was dominant overhead),
optimization tiers applied, benchmark results, and remaining opportunities.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
EmitterMethodCreator unconditionally emitted RegexState.save() at every
subroutine entry, creating and pushing a 13-field snapshot even when the
subroutine never uses regex. Now uses RegexUsageDetector to check the AST
at compile time and only emits save/restore when the body contains regex
operations or eval STRING (which may introduce regex at runtime).

This is safe because subroutines without regex don't modify regex state,
and any callees that use regex do their own save/restore at their boundary.

Closure: 814 -> 1177 ops/s (+44%, now +36% FASTER than master)
Method: 399 -> 417 ops/s (+5%, now -4.4% vs master)

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Closure is now +36% faster than master (was -34%).
Method is now -4.4% vs master (was -27%).

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Migrate Mro caches (packageGenerations, isaRevCache, pkgGenIsaState),
RuntimeIO.openHandles LRU cache, RuntimeRegex.optimizedRegexCache,
OutputFieldSeparator.internalOFS, OutputRecordSeparator.internalORS,
and ByteCodeSourceMapper (all 7 fields via new State inner class)
to per-PerlRuntime instance fields for multiplicity thread-safety.

No performance regression vs baseline benchmarks.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Previously, BytecodeCompiler.visitAnonymousSubroutine() always compiled
anonymous sub bodies to InterpretedCode. Hot closures created via eval
STRING (e.g., Benchmark.pm's timing wrapper) ran in the bytecode
interpreter instead of as native JVM bytecode.

Now tries JVM compilation first via EmitterMethodCreator.createClassWithMethod(),
falling back to the interpreter on any failure. A new JvmClosureTemplate
class holds the JVM-compiled class and instantiates closures with captured
variables via reflection.

Measured 4.5x speedup for eval STRING closures in isolation (6.4M iter/s
vs 1.4M iter/s). Updated benchmark results in concurrency.md - all
previously regressed benchmarks now match or exceed master.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
perf: optimize ThreadLocal overhead and JVM-compile eval STRING closures
# Conflicts:
#	src/main/java/org/perlonjava/backend/bytecode/BytecodeInterpreter.java
#	src/main/java/org/perlonjava/backend/bytecode/EvalStringHandler.java
#	src/main/java/org/perlonjava/backend/bytecode/OpcodeHandlerExtended.java
#	src/main/java/org/perlonjava/backend/jvm/ByteCodeSourceMapper.java
#	src/main/java/org/perlonjava/backend/jvm/EmitRegex.java
#	src/main/java/org/perlonjava/backend/jvm/astrefactor/LargeBlockRefactorer.java
#	src/main/java/org/perlonjava/core/Configuration.java
#	src/main/java/org/perlonjava/frontend/parser/ParsePrimary.java
#	src/main/java/org/perlonjava/runtime/regex/RuntimeRegex.java
#	src/main/java/org/perlonjava/runtime/runtimetypes/CallerStack.java
#	src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeCode.java
#	src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeGlob.java
#	src/main/java/org/perlonjava/runtime/runtimetypes/RuntimeScalar.java
@fglock fglock merged commit 57af1ad into master Apr 11, 2026
2 checks passed
fglock added a commit that referenced this pull request Apr 11, 2026
This reverts commit 57af1ad, reversing
changes made to 00c0dde.
fglock added a commit that referenced this pull request Apr 11, 2026
revert: undo multiplicity merge (PR #480) to investigate Moo regression
fglock added a commit that referenced this pull request Apr 11, 2026
PR #480 was reverted due to a Scalar::Util version parsing error
that broke Moo tests. Root cause: regex timeout executor threads
had no PerlRuntime bound via ThreadLocal.

This plan breaks the original 96-file monolithic PR into 16
independently testable phases, each with its own validation gate.

Generated with [Devin](https://cli.devin.ai/docs)

Co-Authored-By: Devin <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant