For about half a year, I've been running some code on a ESP32-S3 that periodically makes HTTP GET requests. Occasionally, it would occasionally completely lock up and need to be restarted. After some experimentation, I was able to reproduce by making a HTTP GET request to a server that was accessible, but not listening on that port. So my hunch is that the HTTP servers I was making requests to would occasionally briefly go down for maintenance or similar and a badly timed request would lock things up.
Turns out there was a deadlock existing driver: fixed here #2200
Gave this [fix] a quick test:
1970-01-01T00:00:05.485Z [info] <0.18.0> Tick!
1970-01-01T00:00:06.569Z [info] <0.18.0> Tick!
1970-01-01T00:00:06.993Z [debug] <0.33.0> HTTP GET http://foobar/baz
1970-01-01T00:00:25.570Z [debug] <0.33.0> HTTP Error: {error,{gen_tcp,econnaborted}}
1970-01-01T00:00:25.677Z [info] <0.18.0> Tick!
1970-01-01T00:00:27.098Z [info] <0.18.0> Tick!
So that does look like ~18s of stall, which is a lot better than permanently locking up.
But it still stalls the scheduler for 18 seconds, yes we can tune that down to maybe half but still blocking.
@pikdum contributed a non-blocking PoC here in #2197 - that I progressed to PR worthy level (according to LLM, all caveats apply!) here https://github.com/petermm/AtomVM/tree/test-2197
So this issue is to track replacing the old driver.
Turns out there was a deadlock existing driver: fixed here #2200
But it still stalls the scheduler for 18 seconds, yes we can tune that down to maybe half but still blocking.
@pikdum contributed a non-blocking PoC here in #2197 - that I progressed to PR worthy level (according to LLM, all caveats apply!) here https://github.com/petermm/AtomVM/tree/test-2197
So this issue is to track replacing the old driver.