Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
c816245
Add worker thread pool for high-throughput Python operations
benoitc Feb 25, 2026
44efddc
Fix eval locals_term initialization and add benchmark results
benoitc Feb 26, 2026
bc97a07
Fix two race conditions in worker pool
benoitc Feb 26, 2026
9956584
Fix worker pool ASGI to use hornbeam run_asgi interface
benoitc Feb 26, 2026
1189715
Add py_resource_pool and subinterpreter support with mutex locking
benoitc Feb 26, 2026
d1617dc
Implement process-per-context architecture with reentrant callbacks
benoitc Feb 27, 2026
0eca656
Fix timeout handling and add contexts_started helper
benoitc Feb 27, 2026
1f6bf04
Fix thread worker handlers not re-registering after app restart
benoitc Feb 28, 2026
21255f5
Fix subinterpreter cleanup and thread worker re-registration
benoitc Feb 28, 2026
f61b83a
Unify erlang Python module with callback and event loop API
benoitc Feb 28, 2026
c262241
Fix tests to use erlang.run() instead of removed erlang_asyncio module
benoitc Feb 28, 2026
3665128
Fix timer scheduling for standalone ErlangEventLoop instances
benoitc Feb 28, 2026
29c8a41
Merge remote-tracking branch 'origin/main' into feature/py-worker-pool
benoitc Mar 1, 2026
2c9a451
Replace async worker pthread backend with event loop model
benoitc Mar 1, 2026
54bd549
Remove global state from py_event_loop.c for per-interpreter isolation
benoitc Mar 1, 2026
10dbea7
Fix py_asyncio_compat_SUITE tests and consolidate erlang module
benoitc Mar 1, 2026
5032ec6
Fix unawaited coroutine warnings in tests
benoitc Mar 1, 2026
1bbb3ba
Fix FD stealing and UDP connected socket issues
benoitc Mar 1, 2026
89ff775
Fix context test expectations for Python contextvars behavior
benoitc Mar 1, 2026
cbf324a
Remove subprocess support from ErlangEventLoop
benoitc Mar 2, 2026
4a07e1d
Add ETF encoding for pids/refs and fix executor/socket tests
benoitc Mar 2, 2026
cde0a8d
Add erlang.reactor module for fd-based protocol handling
benoitc Mar 2, 2026
8e86d77
Add audit hook sandbox and remove signal support
benoitc Mar 2, 2026
4da4378
Update CHANGELOG for unreleased changes since 1.8.1
benoitc Mar 2, 2026
e22331f
Add security and reactor documentation, update asyncio docs
benoitc Mar 2, 2026
8f3e379
Rename call_async to cast and add benchmark
benoitc Mar 2, 2026
e09b15a
Add migration guide for v1.8.x to v2.0
benoitc Mar 2, 2026
fd236ce
Add subinterpreter event loop isolation
benoitc Mar 3, 2026
3cb5854
Skip tests incompatible with subinterpreters
benoitc Mar 3, 2026
363772f
Add OWN_GIL subinterpreter support for true Python parallelism
benoitc Mar 7, 2026
55751da
Fix ASan/TSan LD_PRELOAD in CI
benoitc Mar 7, 2026
316df81
Simplify CI: ASan only, fix LD_PRELOAD for all steps
benoitc Mar 7, 2026
84a8bb8
Remove debug counters step from ASan builds
benoitc Mar 7, 2026
3d0add0
Document OWN_GIL subinterpreter parallelism
benoitc Mar 7, 2026
df32f2a
Fix py_async_e2e_SUITE for subinterpreters
benoitc Mar 7, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 52 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,58 @@ jobs:
'
continue-on-error: true # Free-threading is experimental

# ASan builds for detecting memory issues
test-asan:
name: ASan / Python ${{ matrix.python }}
runs-on: ubuntu-24.04

strategy:
fail-fast: false
matrix:
python: ["3.12", "3.13"]

steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}

- name: Set up Erlang
uses: erlef/setup-beam@v1
with:
otp-version: "27.0"
rebar3-version: "3.24"

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y cmake

- name: Set Python library path
run: |
PYTHON_LIB=$(python3 -c "import sysconfig; print(sysconfig.get_config_var('LIBDIR'))")
echo "LD_LIBRARY_PATH=${PYTHON_LIB}:${LD_LIBRARY_PATH}" >> $GITHUB_ENV

- name: Clean and compile with ASan
run: |
rm -rf _build/cmake
mkdir -p _build/cmake
cd _build/cmake
cmake ../../c_src -DENABLE_ASAN=ON -DENABLE_UBSAN=ON
cmake --build . -- -j $(nproc)
cd ../..
rebar3 compile

- name: Run tests with ASan
env:
ASAN_OPTIONS: detect_leaks=0:abort_on_error=1
run: |
export LD_PRELOAD=$(gcc -print-file-name=libasan.so)
rebar3 ct --readable=compact

lint:
name: Lint
runs-on: ubuntu-24.04
Expand Down
131 changes: 125 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@

### Added

- **OWN_GIL Subinterpreter Thread Pool** - True parallelism with Python 3.12+ subinterpreters
- Each subinterpreter runs in its own thread with its own GIL (`Py_GIL_OWN`)
- Thread pool manages N subinterpreters for parallel Python execution
- `py:context(N)` returns the Nth context PID for explicit context selection
- `py_context_router` provides scheduler-affinity routing for automatic distribution
- Cast operations are 25-30% faster compared to worker mode
- Full isolation between subinterpreters (separate namespaces, modules, state)
- New C files: `py_subinterp_pool.c`, `py_subinterp_pool.h`

- **`erlang.reactor` module** - FD-based protocol handling for building custom servers
- `reactor.Protocol` - Base class for implementing protocols
- `reactor.serve(sock, protocol_factory)` - Serve connections using a protocol
- `reactor.run_fd(fd, protocol_factory)` - Handle a single FD with a protocol
- Integrates with Erlang's `enif_select` for efficient I/O multiplexing
- Zero-copy buffer management for high-throughput scenarios

- **ETF encoding for PIDs and References** - Full Erlang term format support
- Erlang PIDs encode/decode properly in ETF binary format
- Erlang References encode/decode properly in ETF binary format
- Enables proper serialization for distributed Erlang communication

- **PID serialization** - Erlang PIDs now convert to `erlang.Pid` objects in Python
and back to real PIDs when returned to Erlang. Previously, PIDs fell through to
`None` (Erlang→Python) or string representation (Python→Erlang).
Expand All @@ -16,12 +37,111 @@
Subclass of `Exception`, so it's catchable with `except Exception` or
`except erlang.ProcessError`.

- **Audit hook sandbox** - Block dangerous operations when running inside Erlang VM
- Uses Python's `sys.addaudithook()` (PEP 578) for low-level blocking
- Blocks: `os.fork`, `os.system`, `os.popen`, `os.exec*`, `os.spawn*`, `subprocess.Popen`
- Raises `RuntimeError` with clear message about using Erlang ports instead
- Automatically installed when `py_event_loop` NIF is available

- **Process-per-context architecture** - Each Python context runs in dedicated process
- `py_context_process` - Gen_server managing a single Python context
- `py_context_sup` - Supervisor for context processes
- `py_context_router` - Routes calls to appropriate context process
- Improved isolation between contexts
- Better crash recovery and resource management

- **Worker thread pool** - High-throughput Python operations
- Configurable pool size for parallel execution
- Efficient work distribution across threads

- **`py:contexts_started/0`** - Helper to check if contexts are ready

### Changed

- **`py:call_async` renamed to `py:cast`** - Follows gen_server convention where
`call` is synchronous and `cast` is asynchronous. The semantics are identical,
only the name changed.

- **Unified `erlang` Python module** - Consolidated callback and event loop APIs
- `erlang.run(coro)` - Run coroutine with ErlangEventLoop (like uvloop.run)
- `erlang.new_event_loop()` - Create new ErlangEventLoop instance
- `erlang.install()` - Install ErlangEventLoopPolicy (deprecated in 3.12+)
- `erlang.EventLoopPolicy` - Alias for ErlangEventLoopPolicy
- Removed separate `erlang_asyncio` module - all functionality now in `erlang`

- **Async worker backend replaced with event loop model** - The pthread+usleep
polling async workers have been replaced with an event-driven model using
`py_event_loop` and `enif_select`:
- Removed `py_async_worker.erl` and `py_async_worker_sup.erl`
- Removed `py_async_worker_t` and `async_pending_t` structs from C code
- Deprecated `async_worker_new`, `async_call`, `async_gather`, `async_stream` NIFs
- Added `py_event_loop_pool.erl` for managing event loop-based async execution
- Added `py_event_loop:run_async/2` for submitting coroutines to event loops
- Added `nif_event_loop_run_async` NIF for direct coroutine submission
- Added `_run_and_send` wrapper in Python for result delivery via `erlang.send()`
- **Internal change**: `py:async_call/3,4` and `py:await/1,2` API unchanged

- **`SuspensionRequired` base class** - Now inherits from `BaseException` instead
of `Exception`. This prevents ASGI/WSGI middleware `except Exception` handlers
from intercepting the suspension control flow used by `erlang.call()`.

- **Per-interpreter isolation in py_event_loop.c** - Removed global state for
proper subinterpreter support. Each interpreter now has isolated event loop state.

- **ErlangEventLoopPolicy always returns ErlangEventLoop** - Previously only
returned ErlangEventLoop for main thread; now consistent across all threads.

### Removed

- **Context affinity functions** - Removed `py:bind`, `py:unbind`, `py:is_bound`,
`py:with_context`, and `py:ctx_*` functions. The new `py_context_router` provides
automatic scheduler-affinity routing. For explicit context control, use
`py_context_router:bind_context/1` and `py_context:call/5`.

- **Signal handling support** - Removed `add_signal_handler`/`remove_signal_handler`
from ErlangEventLoop. Signal handling should be done at the Erlang VM level.
Methods now raise `NotImplementedError` with guidance.

- **Subprocess support** - ErlangEventLoop raises `NotImplementedError` for
`subprocess_shell` and `subprocess_exec`. Use Erlang ports (`open_port/2`)
for subprocess management instead.

### Fixed

- **FD stealing and UDP connected socket issues** - Fixed file descriptor handling
for UDP sockets in connected mode

- **Context test expectations** - Updated tests for Python contextvars behavior

- **Unawaited coroutine warnings** - Fixed warnings in test suite

- **Timer scheduling for standalone ErlangEventLoop** - Fixed timer callbacks not
firing for loops created outside the main event loop infrastructure

- **Subinterpreter cleanup and thread worker re-registration** - Fixed cleanup
issues when subinterpreters are destroyed and recreated

- **ProcessError exception class identity in subinterpreters** - Fixed exception
class mismatch when raising `erlang.ProcessError` in subinterpreter contexts.
The exception class is now looked up from the current interpreter's `erlang`
module at runtime instead of using a global variable.

- **Thread worker handlers not re-registering after app restart** - Workers now
properly re-register when application restarts

- **Timeout handling** - Improved timeout handling across the codebase

- **Eval locals_term initialization** - Fixed uninitialized variable in eval

- **Two race conditions in worker pool** - Fixed concurrent access issues

### Performance

- **Async coroutine latency reduced from ~10-20ms to <1ms** - The event loop model
eliminates pthread polling overhead
- **Zero CPU usage when idle** - Event-driven instead of usleep-based polling
- **No extra threads** - Coroutines run on the existing event loop infrastructure

## 1.8.1 (2026-02-25)

### Fixed
Expand Down Expand Up @@ -102,16 +222,15 @@
### Added

- **Shared Router Architecture for Event Loops**
- Single `py_event_router` process handles all event loops (both shared and isolated)
- Single `py_event_router` process handles all event loops
- Timer and FD messages include loop identity for correct dispatch
- Eliminates need for per-loop router processes
- Handle-based Python C API using PyCapsule for loop references

- **Isolated Event Loops** - Create isolated event loops with `ErlangEventLoop(isolated=True)`
- Default (`isolated=False`): uses the shared global loop managed by Erlang
- Isolated (`isolated=True`): creates a dedicated loop with its own pending queue
- Full asyncio support (timers, FD operations) for both modes
- Useful for multi-threaded Python applications where each thread needs its own loop
- **Per-Loop Capsule Architecture** - Each `ErlangEventLoop` instance has its own isolated capsule
- Dedicated pending queue per loop for proper event routing
- Full asyncio support (timers, FD operations) with correct loop isolation
- Safe for multi-threaded Python applications where each thread needs its own loop
- See `docs/asyncio.md` for usage and architecture details

## 1.6.1 (2026-02-22)
Expand Down
103 changes: 103 additions & 0 deletions PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Plan: Fix Remaining Test Failures

## Current Status

| Python | Passed | Failed | Skipped | Notes |
|--------|--------|--------|---------|-------|
| 3.9 | 203 | 0 | 12 | All pass (subinterp tests skipped) |
| 3.13 | 224 | 3 | 2 | py_pid_send_SUITE failures |
| 3.14 | 219 | 8 | 2 | py_pid_send_SUITE failures |

## Root Cause

The `py_pid_send_SUITE` tests fail on Python 3.12+ with subinterpreter mode because:

1. `init_per_suite` sets `sys.path` via `py:exec()` on the main context
2. When tests run, they use context router which routes to subinterpreter contexts
3. Subinterpreters have isolated `sys.path` - the path modification doesn't propagate
4. Result: `ModuleNotFoundError: No module named 'py_test_pid_send'`

## Fix Options

### Option 1: Set sys.path per-context (Recommended)

Modify `py_context:init/1` to accept an optional `sys_path` list and set it when creating the context. This ensures each subinterpreter has the correct path.

```erlang
%% In py_context.erl
init(#{sys_path := Paths} = Opts) ->
%% After context creation, set sys.path
lists:foreach(fun(P) ->
py_nif:context_exec(Ctx, <<"import sys; sys.path.insert(0, '", P/binary, "')">>)
end, Paths),
...
```

### Option 2: Fix test setup to use context-aware path setting

Modify `py_pid_send_SUITE:init_per_suite/1` to set the path on all contexts:

```erlang
init_per_suite(Config) ->
{ok, _} = application:ensure_all_started(erlang_python),
TestDir = list_to_binary(code:lib_dir(erlang_python, test)),
%% Set path on all contexts
NumContexts = py_context_router:num_contexts(),
[begin
Ctx = py_context_router:get_context(I),
py_context:exec(Ctx, <<"import sys; sys.path.insert(0, '", TestDir/binary, "')">>)
end || I <- lists:seq(1, NumContexts)],
Config.
```

### Option 3: Use absolute imports in tests

Modify tests to use `importlib` with absolute file paths instead of relying on sys.path.

## Implementation Plan

1. **Fix py_pid_send_SUITE init_per_suite** (Option 2)
- Modify to set sys.path on all contexts, not just main
- This is the minimal fix that doesn't require API changes

2. **Add sys_path option to py:context/1** (Option 1 for future)
- Add `sys_path` option to context creation
- Apply to both worker and subinterpreter modes
- Document in API

3. **Test validation**
- Run full test suite on Python 3.9, 3.13, 3.14
- Ensure all tests pass

## Remaining Issue

### test_send_dead_process_raises_process_error (1 failure on 3.13/3.14)

**Problem:** The test calls `erlang.send(dead_pid, msg)` which raises `ProcessError`.
The Python code tries to catch `erlang.ProcessError` but it's not caught.

**Root cause:** In subinterpreter mode, the `erlang` module is created separately
in each subinterpreter. The `ProcessError` exception class created by the NIF
when raising the error is from a different module instance than the one the
Python code imports. This is a class identity issue:

```python
# In subinterpreter
import erlang # Gets subinterpreter's erlang module
try:
erlang.send(dead_pid, msg) # Raises ProcessError from NIF's erlang module
except erlang.ProcessError: # This is a different class!
return True # Never reached
```

**Fix options:**
1. Store exception classes globally and share across subinterpreters
2. Use string-based exception matching in tests
3. Ensure the NIF uses the same exception class as the subinterpreter's erlang module

## Timeline

1. ~~Fix py_pid_send_SUITE init_per_suite~~ - DONE
2. ~~Validate all Python versions pass~~ - DONE (203/226/226 passed)
3. Fix ProcessError class identity issue - follow-up PR
4. Add sys_path option to context API - follow-up PR
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ Key features:
- **AI/ML ready** - Examples for embeddings, semantic search, RAG, and LLMs
- **Logging integration** - Python logging forwarded to Erlang logger
- **Distributed tracing** - Span-based tracing from Python code
- **Security sandbox** - Blocks fork/exec operations that would corrupt the VM

## Requirements

Expand Down Expand Up @@ -66,7 +67,7 @@ application:ensure_all_started(erlang_python).
{ok, 25} = py:eval(<<"x * y">>, #{x => 5, y => 5}).

%% Async calls
Ref = py:call_async(math, factorial, [100]),
Ref = py:cast(math, factorial, [100]),
{ok, Result} = py:await(Ref).

%% Streaming from generators
Expand Down Expand Up @@ -443,7 +444,7 @@ escript examples/logging_example.erl
{ok, Result} = py:call(Module, Function, Args, KwArgs, Timeout).

%% Async
Ref = py:call_async(Module, Function, Args).
Ref = py:cast(Module, Function, Args).
{ok, Result} = py:await(Ref).
{ok, Result} = py:await(Ref, Timeout).
```
Expand Down Expand Up @@ -573,6 +574,8 @@ py:execution_mode(). %% => free_threaded | subinterp | multi_executor
- [Threading](docs/threading.md)
- [Logging and Tracing](docs/logging.md)
- [Asyncio Event Loop](docs/asyncio.md) - Erlang-native asyncio with TCP/UDP support
- [Reactor](docs/reactor.md) - FD-based protocol handling
- [Security](docs/security.md) - Sandbox and blocked operations
- [Web Frameworks](docs/web-frameworks.md) - ASGI/WSGI integration
- [Changelog](https://github.com/benoitc/erlang-python/releases)

Expand Down
10 changes: 10 additions & 0 deletions benchmark_results/baseline_20260224_133948.txt.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Error! Failed to eval:
application:ensure_all_started(erlang_python),
Results = py_scalable_io_bench:run_all(),
py_scalable_io_bench:save_results(Results, "/Users/benoitc/Projects/erlang-python/benchmark_results/baseline_20260224_133948.txt"),
init:stop()


Runtime terminating during boot ({undef,[{py_scalable_io_bench,run_all,[],[]},{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,920}]},{erl_eval,expr,6,[{file,"erl_eval.erl"},{line,668}]},{erl_eval,exprs,6,[{file,"erl_eval.erl"},{line,276}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done
10 changes: 10 additions & 0 deletions benchmark_results/current_20260224_133950.txt.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Error! Failed to eval:
application:ensure_all_started(erlang_python),
Results = py_scalable_io_bench:run_all(),
py_scalable_io_bench:save_results(Results, "/Users/benoitc/Projects/erlang-python/benchmark_results/current_20260224_133950.txt"),
init:stop()


Runtime terminating during boot ({undef,[{py_scalable_io_bench,run_all,[],[]},{erl_eval,do_apply,7,[{file,"erl_eval.erl"},{line,920}]},{erl_eval,expr,6,[{file,"erl_eval.erl"},{line,668}]},{erl_eval,exprs,6,[{file,"erl_eval.erl"},{line,276}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]})

Crash dump is being written to: erl_crash.dump...done
Loading