Exception handling in dl-py: driplineorg/dripline-python#166
In general, the ultimate decision for how an exception should affect the operation of an executable (or a thread within an executable) is by the application's/thread's execute() or other primary operation (e.g. listen()) functions. Here is how exceptions are currently handled:
- ✅ agent
- No error handling
- Creates sub-agent and executes it (see sub_agent_request[reply] below)
- ✅ concurrent_receiver
- try block around main while loop -- will catch exceptions from all message processing that comes through here
- receiver.cc (439),
execute()
- catch
std::exception: uses signal_handler::cancel_all() to shutdown application
- ⚠️ heartbeater
- try block around sending a message -- each time a heartbeat message is sent
- heartbeater.cc (53),
execute()
- catch
message_ptr_t and connection_error: assume offline mode; log error and don't do anything
- catch
dripline_error: log error and don't do anything
- Question: do we really want to do nothing for these?
- Question: what about other code in the main while loop outside of sending?
- ✅ monitor
- try block around starting and joining thread -- this is the main operation of a monitor
- monitor.cc (149),
execute()
- Same treatment as service
- ⚠️ relayer
- try block around sending a message -- each time a message is sent
- relayer.cc (28),
execute()
- catch
message_ptr_t: assume offline mode; log error and do nothing
- catch
connection_error: log error and do nothing
- catch
dripline_error: log error and do nothing
- Question: do we really want to do nothing for these?
- Question: why is treatment of
connection_error slightly inconsistent with heartbeater?
- Question: what about other code in the main while loop outside of sending?
- ❌ scheduler
- ✅ service
- try block around starting and joining threads -- this is the main operation of a service
- service.cc (195),
listen()
- catch
std::system_error: assume didn't connect; return false to exit service
- catch
dripline_error: report error; return false to exit service
- catch
std::exception: report error; return false to exit service
- ⚠️ sub_agent_request[reply]
- try block around sending a message
- agent.cc (153[298]),
create_and_send_message()
- Same handling as service except the return value is set in
f_agent instead of returning false
- Question: what about other code, putting together the message, etc? This is unhandled.
There are some other parts of the error-handling chain that should be considered during this exercise:
- ❓
service::submit_message()
- service.cc (470)
- Call sequence
concurrent_receiver::execute() [as part of service]
service::submit_message() [link between concurrent_receiver and endpoint]
endpoint::sort_message() [as part of service]
- Catches several exception types; logs as
KTERROR; then rethrows
dripline_error
amqp::amqp_exception
amqp::amqp_lib_exception
std::exception
- Question Do we even need to do the catching and rethrowing here? Is there value in the log messages at this stage?
- ✅
endpoint::on_request_message()
- endpoint.cc (88)
- We catch exceptions here so that we can send a reply message
- Call sequence
endpoint::sort_message()
endpoint::on_request_message()
endpoint::__do_[OP]_request()
- The try block surrounds the calling of the different
__do_[OP]_request() functions.
- Catches and actions
throw_reply
- This object is only used to send a reply message
- The return code could be success, a warning, or an error
- After sending, control returns to the thrower
pybind11::error_already_set
- This is how all raised Python exceptions show up in C++, regardless of Python type
- How this is handled depends on a keyword that is checked for in the
what() string
- If it's indicated that the Python exception was a ThrowReply, then it's handled as above for
throw_reply
- Otherwise an unhandled exception reply is sent and the exception is rethrown
std::exception
- An unhandled exception reply is sent and the exception is rethrown
Exception handling in dl-py: driplineorg/dripline-python#166
In general, the ultimate decision for how an exception should affect the operation of an executable (or a thread within an executable) is by the application's/thread's
execute()or other primary operation (e.g.listen()) functions. Here is how exceptions are currently handled:execute()std::exception: usessignal_handler::cancel_all()to shutdown applicationexecute()message_ptr_tandconnection_error: assume offline mode; log error and don't do anythingdripline_error: log error and don't do anythingexecute()execute()message_ptr_t: assume offline mode; log error and do nothingconnection_error: log error and do nothingdripline_error: log error and do nothingconnection_errorslightly inconsistent with heartbeater?execute()listen()std::system_error: assume didn't connect; return false to exit servicedripline_error: report error; return false to exit servicestd::exception: report error; return false to exit servicecreate_and_send_message()f_agentinstead of returning falseThere are some other parts of the error-handling chain that should be considered during this exercise:
service::submit_message()concurrent_receiver::execute()[as part of service]service::submit_message()[link between concurrent_receiver and endpoint]endpoint::sort_message()[as part of service]KTERROR; then rethrowsdripline_erroramqp::amqp_exceptionamqp::amqp_lib_exceptionstd::exceptionendpoint::on_request_message()endpoint::sort_message()endpoint::on_request_message()endpoint::__do_[OP]_request()__do_[OP]_request()functions.throw_replypybind11::error_already_setwhat()stringthrow_replystd::exception