Skip to content

[FLINK-39241][table] Fix MultiJoinStateViews by changing GenericRowData to BinaryRowData#27764

Merged
snuyanzin merged 1 commit intoapache:masterfrom
ldadima:FLINK-39241
Mar 31, 2026
Merged

[FLINK-39241][table] Fix MultiJoinStateViews by changing GenericRowData to BinaryRowData#27764
snuyanzin merged 1 commit intoapache:masterfrom
ldadima:FLINK-39241

Conversation

@ldadima
Copy link
Copy Markdown
Contributor

@ldadima ldadima commented Mar 13, 2026

What is the purpose of the change

To fix FLINK-39241

Verifying this change

Run JoinITCase#testThreeWayMultiJoinWithoutPk

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): ( no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? ( no)
  • If yes, how is the feature documented? (not applicable)

@flinkbot
Copy link
Copy Markdown
Collaborator

flinkbot commented Mar 13, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@ldadima
Copy link
Copy Markdown
Contributor Author

ldadima commented Mar 13, 2026

Hi @gustavodemorais
Can you review, please?

@ldadima
Copy link
Copy Markdown
Contributor Author

ldadima commented Mar 18, 2026

Hey @snuyanzin Could you review, please?

}

@TestTemplate
def testThreeWayMultiJoinWithoutPk(): Unit = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally would be great to follow new store restore test approach
would be great if you can do that instead

Copy link
Copy Markdown
Contributor Author

@ldadima ldadima Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snuyanzin Thanks for review
This test, as in the previous PR with a similar problem, is only temporary. It was decided to expand this PR with the necessary tests.
Is it ok, what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's add also this one to the new suite. I'll take a look at the new suite next

Copy link
Copy Markdown
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @ldadima. A few comments, mostly for consistency with #27508.

@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Mar 27, 2026
@ldadima
Copy link
Copy Markdown
Contributor Author

ldadima commented Mar 27, 2026

Thanks for review @gustavodemorais .
Code changed according your comments

Copy link
Copy Markdown
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Btw, it's usually easier to ping me on the Community Slack if I can help with something.

Also, @ldadima, thinking of next possible topics. How did you perform the benchmark comparison you shared in the last PR? It'd be nice to have an open source benchmark between the MultiJoin and regular Joins. Among other things, it'd be interesting to have numbers on the effects of the cardinality of the common join key. I know we have https://github.com/apache/flink-benchmarks but I haven't contributed to it yet

@ldadima
Copy link
Copy Markdown
Contributor Author

ldadima commented Mar 27, 2026

Thanks @gustavodemorais , I will use this opportunity (Slack) if necessary.
I used this benchmark (https://github.com/nexmark/nexmark)
There is a prepared set of SQL queries. I took part of the set (significant queries with joins) and compared the results with multi-join optimization turned off and on. Also in the nexmark benchmarks, there is Q23 - a query with multiple joins, where the results of multi-join optimization can be seen.

Besides comparing performance, it is also worth noting that regular join supports Window and mini-batch improvements.

@gustavodemorais
Copy link
Copy Markdown
Contributor

I know Q23 and it'd be nice to have some statistic on it for different type of data coming in. In the example of Q23, how does the performance progresses if "B.bidder" has only one record in each table, or 10 records, or 1000 records etc.

The same would be for a larger join, like 10 joins. Do you think https://github.com/apache/flink-benchmarks would be a good way to have this?

@ldadima
Copy link
Copy Markdown
Contributor Author

ldadima commented Mar 27, 2026

I think a general performance comparison is beyond the scope of the task described. In the Jira ticket I created, I wanted to highlight the issue of multi-join with two inputs. I don't think it's necessary to consider the case where the multi-join rule generate multi-join with more than two inputs.

@gustavodemorais
Copy link
Copy Markdown
Contributor

gustavodemorais commented Mar 27, 2026

I think a general performance comparison is beyond the scope of the task described.

I wasn't referring to this PR or another jira you created. I'm talking about another possible new ticket. A general performance comparison considering some aspects would be great to share with the community. Wdyt?

@ldadima
Copy link
Copy Markdown
Contributor Author

ldadima commented Mar 30, 2026

A general performance comparison considering some aspects would be great to share with the community. Wdyt?

Oh, sorry. I didn't understand you earlier. Indeed, it would be good to make smth like that
But we need to understand what we want to compare. There is Q23 (different data distributions) and flink-benchmarks
Maybe it's also worth creating a Jira ticket for this.

Copy link
Copy Markdown
Contributor

@gustavodemorais gustavodemorais left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Anyway, this PR looks good. Thanks for the contribution, @ldadima.

@snuyanzin
Copy link
Copy Markdown
Contributor

Thanks for the PR @ldadima
thanks for the review @gustavodemorais

@snuyanzin snuyanzin merged commit 0c4ab25 into apache:master Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants