[FLINK-39241][table] Fix MultiJoinStateViews by changing GenericRowData to BinaryRowData#27764
Conversation
|
Hi @gustavodemorais |
|
Hey @snuyanzin Could you review, please? |
| } | ||
|
|
||
| @TestTemplate | ||
| def testThreeWayMultiJoinWithoutPk(): Unit = { |
There was a problem hiding this comment.
Ideally would be great to follow new store restore test approach
would be great if you can do that instead
There was a problem hiding this comment.
@snuyanzin Thanks for review
This test, as in the previous PR with a similar problem, is only temporary. It was decided to expand this PR with the necessary tests.
Is it ok, what do you think?
There was a problem hiding this comment.
Yeah, let's add also this one to the new suite. I'll take a look at the new suite next
…ta to BinaryRowData
|
Thanks for review @gustavodemorais . |
gustavodemorais
left a comment
There was a problem hiding this comment.
LGTM. Btw, it's usually easier to ping me on the Community Slack if I can help with something.
Also, @ldadima, thinking of next possible topics. How did you perform the benchmark comparison you shared in the last PR? It'd be nice to have an open source benchmark between the MultiJoin and regular Joins. Among other things, it'd be interesting to have numbers on the effects of the cardinality of the common join key. I know we have https://github.com/apache/flink-benchmarks but I haven't contributed to it yet
|
Thanks @gustavodemorais , I will use this opportunity (Slack) if necessary. Besides comparing performance, it is also worth noting that regular join supports Window and mini-batch improvements. |
|
I know Q23 and it'd be nice to have some statistic on it for different type of data coming in. In the example of Q23, how does the performance progresses if "B.bidder" has only one record in each table, or 10 records, or 1000 records etc. The same would be for a larger join, like 10 joins. Do you think https://github.com/apache/flink-benchmarks would be a good way to have this? |
|
I think a general performance comparison is beyond the scope of the task described. In the Jira ticket I created, I wanted to highlight the issue of multi-join with two inputs. I don't think it's necessary to consider the case where the multi-join rule generate multi-join with more than two inputs. |
I wasn't referring to this PR or another jira you created. I'm talking about another possible new ticket. A general performance comparison considering some aspects would be great to share with the community. Wdyt? |
Oh, sorry. I didn't understand you earlier. Indeed, it would be good to make smth like that |
gustavodemorais
left a comment
There was a problem hiding this comment.
Sounds good. Anyway, this PR looks good. Thanks for the contribution, @ldadima.
|
Thanks for the PR @ldadima |
What is the purpose of the change
To fix FLINK-39241
Verifying this change
Run JoinITCase#testThreeWayMultiJoinWithoutPk
Does this pull request potentially affect one of the following parts:
@Public(Evolving): ( no)Documentation