Fix state not showing up fast enough with TestDelayedEvents (de-flake v2)#869
Fix state not showing up fast enough with TestDelayedEvents (de-flake v2)#869MadLittleMods wants to merge 7 commits intomainfrom
TestDelayedEvents (de-flake v2)#869Conversation
Follow-up to #830 As experienced in https://github.com/element-hq/synapse-rust-apps/actions/runs/24910122124/job/72949760158?pr=360 (element-hq/synapse-rust-apps#360) ``` ❌ TestDelayedEvents/delayed_state_events_are_kept_on_server_restart (10.12s) delayed_event_test.go:425: StopServer hs1 delayed_event_test.go:429: StartServer hs1 delayed_event_test.go:443: CSAPI.MustDo GET http://127.0.0.1:32978/_matrix/client/v3/rooms/%21MbDncghrqxTzEmQhCP:hs1/state/com.example.test/1 returned non-2xx code: 404 Not Found - body: {"errcode":"M_NOT_FOUND","error":"Event not found."} ``` This is most likely caused because the main process handles processing delayed events (and serving `/state`) but the state will be persisted on one of event_persister workers so the main process might be serving stale data until the invalidation comes through.
Traditional `/sync` expects you to use `state` as the base and layer state from the `timeline` on top (flawed because of state res). So we have to check in the `timeline` instead of `state` for the update. We can also consider this good here because we don't expect any state to be rejected because of state res issues.
Use a sync filter so `timeline` doesn't mess with us Don't use incremental sync because no need
| // Check that the `state` section for `roomID` has an event which passes the check function. | ||
| // | ||
| // Note that the `state` section of a sync response only contains the change in state up | ||
| // to the start of the `timeline` and will not contain the entire state of the room for | ||
| // incremental or `lazy_load_members` syncs. | ||
| func SyncStateHas(roomID string, check func(gjson.Result) bool) SyncCheckOpt { |
There was a problem hiding this comment.
This was the reason the previous PR didn't work, #865
Just re-formatted this a bit 🤷 (can revert if not useful)
|
|
||
| // And includes the correct content | ||
| // | ||
| // FIXME: This assertion seems superfluous to this test and should be it's own test |
There was a problem hiding this comment.
Added some FIXME for the messiness I'd like to see cleaned up but not doing in this PR so the purpose is more focused and understandable.
|
|
||
| stateKey := "to_send_on_timeout" | ||
|
|
||
| // Schedule a delayed event |
There was a problem hiding this comment.
Added comments to understand these tests better
| // No more delayed events | ||
| matchDelayedEvents(t, user, delayedEventsNumberEqual(0)) |
There was a problem hiding this comment.
Moved this check until after we've actually waited for the state to show up and we know the delayed event was processed.
| // Check for the state change from the delayed state event (using `MustSyncUntil` to | ||
| // account for any processing or worker replication delays) | ||
| user.MustSyncUntil(t, client.SyncReq{Filter: NoTimelineSyncFilter}, client.SyncStateHas(roomID, func(ev gjson.Result) bool { | ||
| return ev.Get("type").Str == eventType && ev.Get("state_key").Str == stateKey | ||
| })) |
There was a problem hiding this comment.
The main fix to all of these tests is using MustSyncUntil to wait and account for any processing or worker/replication delays.
| @@ -215,10 +215,11 @@ func SyncTimelineHasEventID(roomID string, eventID string) SyncCheckOpt { | |||
| }) | |||
| } | |||
|
|
|||
There was a problem hiding this comment.
@reivilibre assigning you since you have the context from #865
Same sort of fix but applied to all of tests here
Fix state not showing up fast enough with
TestDelayedEvents(de-flake tests)Re-introducing the changes from #865 which were reverted in #867
Follow-up to #830
As experienced when running this test against the worker-based Synapse setup we use alongside the Synapse Pro Rust apps, https://github.com/element-hq/synapse-rust-apps/actions/runs/24910122124/job/72949760158?pr=360 (https://github.com/element-hq/synapse-rust-apps/pull/360)
Also updating the rest of the
TestDelayedEventstests to better account for state changes that might not show up immediately because processing/worker/replication delayWhy does this flake happen?
Discussed in #865 (comment)
Dev notes
MSC4140 Synapse implementation added in element-hq/synapse#17326
Testing with Synapse:
Pull Request Checklist