Conversation
…rvice availability Service calls could occur before the corresponding server was available, leading to intermittent failures in distributed setups. This change ensures the service is available before making the call.
|
Please target the |
|
|
||
| // Function to check if the service server is available | ||
| auto wait_for_service = [this](auto& client, const std::string& service_name) { | ||
| if (!client->wait_for_service(std::chrono::seconds(10))) |
There was a problem hiding this comment.
We are already seeing problems where MoveIt is not tolerant enough to startup delays on large systems. I think this will cause more startup crashes on our system.
If there is going to be a hard timeout, could it at least be configurable?
There was a problem hiding this comment.
I feel there are two three possible fixes. Either we can make the block indefinite because there is no point in proceeding if rviz plugin can't find the list of planners available.
Secondly the action above the service has a standard 30 seconds delay which cannot be configured. We can add a similarly large delay.
Third we can declare a parameter which can be passed using yaml file as the plugin uses rviz node for all the communication and user should be able to configure it during launch
There was a problem hiding this comment.
My personal preference would be for this to use a configurable timeout, like the wait_for_servers timeout used in the action client above.
I'm not sure what the MoveIt maintainers would prefer though.
There was a problem hiding this comment.
Wait for servers uses a constant 30 seconds delay. It is not configurable
There was a problem hiding this comment.
I thought wait_for_servers was a constructor argument of MoveGroupInterface, and so user-configurable?
Maybe we're talking about different waits? Where is the 30 second wait you're talking about defined?
There was a problem hiding this comment.
Ah, yeah, I understand now, thanks for the clarification.
There was a problem hiding this comment.
I think it would be useful for the plugin to have configurable waits, but I'm guessing that's out of scope for this PR.
There was a problem hiding this comment.
So I'll add another argument to the constructor call for service wait time, with default value of 10 seconds. Is that okay?
There was a problem hiding this comment.
Cool, thanks for the inputs.
…to include a rclcpp::Duration for wait_for_service delay time, which can be configured at the place of construction of MoveGroupInterfaceImpl object
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## humble #3711 +/- ##
==========================================
+ Coverage 51.21% 51.22% +0.01%
==========================================
Files 407 407
Lines 30667 30667
==========================================
+ Hits 15704 15705 +1
+ Misses 14963 14962 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Fix a race condition in
MoveGroupInterfaceImplinitialization where service calls could be made before the corresponding service servers are available.Problem
During initialization,
MoveGroupInterfaceImplqueries services such asquery_planner_interface. In distributed setups, service discovery can be delayed due to asynchronous DDS initialization.This can lead to a race condition where the service client attempts to call the service before it is available, resulting in incomplete initialization (e.g. missing markers in RViz) or blocking behavior.
This issue is more likely to appear in multi-machine setups and when additional discovery latency is introduced (e.g. DDS security).
For full reproduction details, see the related issue.
Solution
Add a guard to ensure that required services are available before issuing service requests during initialization.
This is implemented by waiting for the service using
wait_for_service()before making the call.Testing
Related Issue
#3698