Skip to content

Add context reset and reload functionality#12

Open
benoitc wants to merge 7 commits intomainfrom
feature/context-reset
Open

Add context reset and reload functionality#12
benoitc wants to merge 7 commits intomainfrom
feature/context-reset

Conversation

@benoitc
Copy link
Owner

@benoitc benoitc commented Mar 7, 2026

Summary

  • py:reset_context/1 - Soft reset that clears namespace (keeps builtins/erlang)
  • py:reload_context/2 - Reload specified modules using importlib.reload()
  • NIF functions: context_reset, context_reload

API

%% Reset a context (clears user-defined variables)
Ctx = py:context(1),
ok = py:exec(Ctx, <<"x = 42">>),
ok = py:reset_context(Ctx),
{error, _} = py:eval(Ctx, <<"x">>).  %% x is gone

%% Reload modules (hot reload for development)
ok = py:exec(Ctx, <<"import mymodule">>),
%% ... modify mymodule.py on disk ...
ok = py:reload_context(Ctx, [<<"mymodule">>]).

Use Cases

  • Hot code reload during development
  • Memory cleanup in long-running contexts
  • Reset after running untrusted code
  • Clean state between request batches

benoitc and others added 7 commits March 7, 2026 02:40
* Add worker thread pool for high-throughput Python operations

Implement a general-purpose worker thread pool that eliminates per-request
GIL acquisition overhead. Each worker holds the GIL (or has its own
subinterpreter with OWN_GIL on Python 3.12+) and processes requests from
a shared MPSC queue.

Key features:
- Sync API: call, apply, eval, exec, asgi_run, wsgi_run
- Async API: all *_async variants returning request_id for non-blocking calls
- await/1,2 for waiting on async results
- Per-worker module caching to avoid reimport overhead
- Support for FREE_THREADED (3.13+), SUBINTERP (3.12+), and FALLBACK modes

* Fix eval locals_term initialization and add benchmark results

- Fix potential crash when locals_term is uninitialized (check for 0)
- Add benchmark results directory with baseline comparisons

Known issue: ~0.5-1% of concurrent sync calls may timeout under high
load (100+ concurrent callers). Async API unaffected.

* Fix two race conditions in worker pool

1. Use-after-free on request_id: Save request_id BEFORE enqueueing
   the request to the worker pool. Once enqueued, a worker can
   process and free the request at any time. Accessing req->request_id
   after py_pool_enqueue() is undefined behavior.

2. Double-free of msg_env: After a successful enif_send(), the message
   environment is consumed/invalidated by the Erlang runtime. We must
   set req->msg_env = NULL to prevent py_pool_request_free() from
   calling enif_free_env() on an already-freed environment.

These bugs caused ~0.5-1% of concurrent calls to timeout under high load
because request IDs could be corrupted, leading to message/response
mismatch.

Also adds debug counters (responses_sent, responses_failed) to pool stats
for monitoring send success rate.

* Fix worker pool ASGI to use hornbeam run_asgi interface

Changed py_pool_process_asgi to call run_asgi(module_name, callable_name,
scope, body) instead of run(app, scope, body), matching hornbeam's
hornbeam_asgi_runner interface.

Also updated extract_asgi_response to handle both dict and tuple return
formats, supporting hornbeam's dict-based response.

* Add py_resource_pool and subinterpreter support with mutex locking

- Add compile-time detection of PyInterpreterConfig_OWN_GIL (Python 3.12+)
- Add mutex to py_subinterp_worker_t for thread-safe parallel access
- Add nif_subinterp_asgi_run for ASGI on subinterpreters
- Add py_resource_pool module with lock-free round-robin scheduling
- Benchmark shows 8-10x improvement with subinterpreters enabled

* Implement process-per-context architecture with reentrant callbacks

Replace worker pool with process-per-context model where each Python context
is owned by a dedicated Erlang process. Enables reentrant callbacks via
suspension-based mechanism without deadlock.

- Add py_context.erl with recursive receive pattern for inline callback handling
- Add py_context_router.erl for scheduler-affinity based routing
- Add nif_context_resume for Python replay with cached callback results
- Support sequential callbacks via callback_results array accumulation
- Remove old pool modules (py_pool, py_worker, py_worker_pool, etc.)

* Fix timeout handling and add contexts_started helper

- Pass timeout parameter through py:eval/3 and do_call/5
- Add py:contexts_started/0 and py_context_router:is_started/0
- Fix test_timeout to use time.sleep for reliable delay
- Fix thread callback suite to check existing contexts

* Fix thread worker handlers not re-registering after app restart

When the application restarts, py_thread_handler registers as the new
coordinator, but existing thread workers in the NIF-level pool still
had has_handler=true from the previous run. This caused them to skip
spawning new handler processes and write to dead pipes.

Reset has_handler=false on all existing workers when a new coordinator
is registered.

* Fix subinterpreter cleanup and thread worker re-registration

Two fixes:

1. suspended_context_state_destructor: For subinterpreters with OWN_GIL,
   use PyThreadState_Swap to switch to the correct interpreter before
   releasing Python objects. PyGILState_Ensure only works for the main
   interpreter and causes memory corruption with subinterpreter objects.

2. thread_worker_set_coordinator: Reset has_handler=false on all existing
   workers when a new coordinator registers (e.g., after app restart).
   Old workers kept has_handler=true but their handler processes were dead.

* Unify erlang Python module with callback and event loop API

- Rename priv/erlang/ to priv/_erlang_impl/ to avoid C module shadowing
- Add _extend_erlang_module() helper in py_callback.c to re-export
  Python package functions (run, new_event_loop, EventLoopPolicy, etc.)
- Update py_event_loop.erl to call extension during initialization
- Delete buggy erlang_asyncio.py (blocking sleep replaced by proper
  asyncio.sleep backed by Erlang timers via call_later)
- Add test infrastructure in priv/tests/ for event loop integration

The unified erlang module now provides uvloop-compatible API:
- erlang.run(coro) - run async code with Erlang event loop
- erlang.new_event_loop() - create ErlangEventLoop instance
- erlang.install() - install ErlangEventLoopPolicy (deprecated 3.12+)
- erlang.call() / erlang.async_call() - call Erlang functions
- asyncio.sleep() works via Erlang timers

* Fix tests to use erlang.run() instead of removed erlang_asyncio module

- Update py_erlang_sleep_SUITE to use erlang.run() with standard asyncio
  instead of the removed erlang_asyncio module
- Skip py_asyncio_compat_SUITE: tests create standalone ErlangEventLoop
  instances via erlang.new_event_loop() and call loop.run_forever().
  Timer scheduling for standalone loops needs work - timers fire
  immediately instead of after the scheduled delay.

* Fix timer scheduling for standalone ErlangEventLoop instances

- Add isolated parameter to ErlangEventLoop.__init__() that creates
  a per-loop capsule via _loop_new() for proper event routing
- Update all loop methods (call_at, _run_once, stop, close, add_reader,
  remove_reader, add_writer, remove_writer) to use per-loop capsule APIs
  when running as isolated instance
- new_event_loop() now passes isolated=True by default
- Fix run_forever() to honor stop() called before run_forever() by not
  resetting _stopping flag at start
- Simplify async_test_runner to run tests synchronously without
  erlang.run() wrapper, avoiding nested event loop issues
- Add timeout fallback to test_add_remove_writer to prevent hanging
- Remove skip from py_asyncio_compat_SUITE to enable tests

Test results: 46 tests run, 42 passed, 4 failures (edge cases)

* Replace async worker pthread backend with event loop model

The pthread+usleep polling async workers have been replaced with an
event-driven model using py_event_loop and enif_select:

- Add _run_and_send wrapper in Python for result delivery via erlang.send()
- Add nif_event_loop_run_async NIF for direct coroutine submission
- Add py_event_loop:run_async/2 Erlang API
- Add py_event_loop_pool.erl for managing event loop-based async execution
- Rewrite py_async_pool.erl to delegate to event_loop_pool
- Update supervisor tree to include py_event_loop_pool
- Remove py_async_worker.erl and py_async_worker_sup.erl
- Stub deprecated async_worker NIFs to return errors
- Remove async_event_loop_thread and async_future_callback C code

Performance improvements:
- Latency: ~10-20ms polling -> <1ms (enif_select)
- CPU idle: 100 wakeups/sec -> Zero
- Threads: N pthreads -> 0 extra threads

API unchanged: py:async_call/3,4 and py:await/1,2 work the same.

* Remove global state from py_event_loop.c for per-interpreter isolation

Replace global variables with module state structure stored in the
Python module, enabling proper per-interpreter/per-context event
loop isolation.

Changes:
- Add py_event_loop_module_state_t struct containing event_loop,
  shared_router, shared_router_valid, and isolation_mode
- Update PyModuleDef to allocate module state (m_size)
- Update get_interpreter_event_loop() to read from module state
- Update set_interpreter_event_loop() to write to module state
- Update nif_set_python_event_loop() to use module state
- Update nif_set_isolation_mode() to use module state
- Update nif_set_shared_router() to use module state
- Update py_get_isolation_mode() to read from module state
- Update py_loop_new() to read shared_router from module state
- Update event_loop_destructor() to clear module state
- Update create_default_event_loop() to use module state
- Remove g_python_event_loop, g_shared_router, g_shared_router_valid,
  and g_isolation_mode global variables

* Fix py_asyncio_compat_SUITE tests and consolidate erlang module

- Remove erlang_loop.py, use _erlang_impl as the single implementation
- Add get_event_loop_policy() export to _erlang_impl and erlang module
- Fix signal tests: ErlangEventLoop has limited signal support (SIGINT,
  SIGTERM, SIGHUP only), other signals raise ValueError
- Skip subprocess tests for Erlang (not yet implemented)
- Update all imports to use erlang module (public API) with _erlang_impl
  as internal fallback
- Update docs and examples to use erlang module imports

* Fix unawaited coroutine warnings in tests

- test_run_until_complete_nested_raises: Use asyncio.sleep(0.1) to ensure
  timer path (not fast path), properly close coroutine in finally block
- test_run_until_complete_on_closed_raises: Store coroutine in variable
  and close it in finally block
- tearDown: Cancel pending tasks and shutdown async generators before
  closing loop to prevent resource leaks
- Add test_asyncio_sleep_zero_fast_path: Verify sleep(0) uses fast path
- test_add_remove_writer: Use socketpair for reliable write readiness

* Fix FD stealing and UDP connected socket issues

- Share fd_resource per fd to prevent enif_select stealing errors
- Add NIF functions for fd resource management
- Use send() instead of sendto() for connected UDP sockets
- Fix TCP EOF handling to call connection_lost properly

* Fix context test expectations for Python contextvars behavior

await coro() runs in shared context (changes visible to caller),
while create_task(coro()) runs in copied context (changes isolated).
Updated test_context_in_task and test_multiple_context_vars to
reflect correct Python behavior.

* Remove subprocess support from ErlangEventLoop

Subprocess is not supported because Python's subprocess module uses
fork() which corrupts the Erlang VM when called from within the NIF.

Users should use Erlang ports directly via erlang.call() instead,
which provides superior subprocess management with built-in
supervision, monitoring, and fault tolerance.

Changes:
- Replace _subprocess.py with NotImplementedError stub and docs
- Remove subprocess event handling from _loop.py
- Remove subprocess functions from py_event_loop.c
- Update tests to verify NotImplementedError is raised
- Set HAS_SUBPROCESS_SUPPORT = False in test base

* Add ETF encoding for pids/refs and fix executor/socket tests

ETF encoding for pids and references:
- Add decode_etf_string() helper in py_callback.c to convert
  __etf__:base64 encoded strings back to Erlang terms
- Add ETF encoding in term_to_python_repr for pids and refs
  in py_context.erl and py_thread_handler.erl

Test fixes:
- Skip ProcessPoolExecutor test inside Erlang NIF (fork issues)
- Use 'spawn' multiprocessing context instead of 'fork'
- Accept OSError in addition to TimeoutError for connect timeout test

Cleanup:
- Remove obsolete multi_loop test files

* Add erlang.reactor module for fd-based protocol handling

Implement low-level fd-based API where Erlang handles I/O scheduling
via enif_select and Python handles protocol logic.

- Add priv/_erlang_impl/_reactor.py with Protocol base class and registry
- Add src/py_reactor_context.erl for Erlang reactor context process
- Expose erlang.reactor via sys.modules for 'import erlang.reactor' syntax
- Add test suite (py_reactor_SUITE.erl) with 6 tests
- Add Python tests (py_test_reactor.py) with 3 tests
- Add examples/reactor_echo.erl as usage example

Works with any fd - TCP, UDP, Unix sockets, pipes, etc.

* Add audit hook sandbox and remove signal support

- Add _sandbox.py with Python audit hooks (PEP 578) to block dangerous
  operations: fork, exec, spawn, subprocess, os.system, os.popen
- Install sandbox automatically when running inside Erlang VM
- Remove signal handling support (not applicable in Erlang context)
- Update policy to always return ErlangEventLoop
- Fix ExecutionMode test to check correct enum values
- Remove signal tests and AIO subprocess tests from test suite

* Update CHANGELOG for unreleased changes since 1.8.1

* Add security and reactor documentation, update asyncio docs

New documentation:
- docs/security.md: Document audit hook sandbox, blocked operations
  (fork, exec, subprocess), and Erlang port alternatives
- docs/reactor.md: Document erlang.reactor module for FD-based
  protocol handling with Protocol base class and examples

Updated documentation:
- docs/asyncio.md: Update for unified erlang module, mark
  erlang.install() as deprecated in 3.12+, add Limitations section
  for subprocess/signal handling, add ExecutionMode documentation
- docs/getting-started.md: Add Security Considerations section,
  update asyncio section to use erlang.run()
- README.md: Add security sandbox to features, add doc links

Also fixed edoc errors in source files:
- src/py_nif.erl: Fix angle bracket syntax in reactor function docs
- src/py_context_router.erl: Replace markdown code blocks with <pre>

* Rename call_async to cast and add benchmark

API change: py:call_async/3,4 renamed to py:cast/3,4 following
gen_server convention (call=sync, cast=async).

Add benchmark_compare.erl for comparing performance between versions.
Current version shows ~2-3x improvement over v1.8.1:
- Sync calls: 0.011ms -> 0.004ms (2.9x faster)
- Cast single: 0.011ms -> 0.004ms (2.8x faster)
- Throughput: ~90K -> ~250K calls/sec

* Add migration guide for v1.8.x to v2.0

Covers:
- py:call_async -> py:cast rename
- py:bind/unbind removal (use py_context_router)
- py:ctx_* removal (use py_context directly)
- erlang_asyncio -> erlang module consolidation
- Subprocess removal (use Erlang ports)
- Signal handler removal (use Erlang level)
- New features: context router, reactor, erlang.send()
- Performance comparison table

* Add subinterpreter event loop isolation

Each subinterpreter context now gets its own event worker for asyncio
support. This ensures asyncio.sleep() and timers work correctly in
subinterpreter contexts.

Changes:
- Add nif_context_get_event_loop/1 NIF to retrieve event loop reference
- Create dedicated event worker per subinterpreter context in py_context
- Extend erlang module with run/new_event_loop in each subinterpreter
- Handle EXIT signals properly (shutdown from supervisor vs normal exits)
- Initialize event loop for worker pool subinterpreters

Worker mode contexts (Python < 3.12) continue to use the shared router.

* Skip tests incompatible with subinterpreters

test_memory_stats and test_reload use modules (tracemalloc) that don't
support Python subinterpreters. Skip these tests when running with
subinterpreter support enabled (Python 3.12+).

* Add OWN_GIL subinterpreter support for true Python parallelism

Subinterpreters with PyInterpreterConfig_OWN_GIL run in dedicated threads,
each with its own GIL, enabling true parallel Python execution on Python 3.12+.

Key changes:
- Thread pool manages subinterpreter lifecycle and context switching
- Atomic state machine for thread-safe subinterpreter state management
- Support blocking callbacks in thread-model subinterpreters
- ProcessError exception class lookup for correct identity in subinterpreters
- Test adjustments for subinterpreter path isolation and error messages

* Fix ASan/TSan LD_PRELOAD in CI

* Simplify CI: ASan only, fix LD_PRELOAD for all steps

* Remove debug counters step from ASan builds

* Document OWN_GIL subinterpreter parallelism

* Fix py_async_e2e_SUITE for subinterpreters

- Use explicit context for all tests
- Relax timing constraint (0.3s instead of 0.15s) for CI
- Add diagnostic messages to assertions
- reactor_register_fd now calls dup() on the fd automatically
- Sets owns_fd=true so the duplicated fd is closed on cleanup
- Users can safely close Erlang's socket after handoff
- Updated documentation to reflect automatic dup()
- py:reset_context/1 - Soft reset that clears namespace (keeps builtins/erlang)
- py:reload_context/2 - Reload specified modules using importlib.reload()
- NIF functions: context_reset, context_reload
- Full test coverage in py_context_reset_SUITE (8 tests)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants