Skip to content

pkgcheck/bash: sort captures by line and column#778

Closed
laumann wants to merge 1 commit intomasterfrom
tree-sitter-sort-captures
Closed

pkgcheck/bash: sort captures by line and column#778
laumann wants to merge 1 commit intomasterfrom
tree-sitter-sort-captures

Conversation

@laumann
Copy link
Copy Markdown
Contributor

@laumann laumann commented Mar 28, 2026

It has been observed that the order of nodes returned by tree-sitter's QueryCursor is not necessarily in order wrt line and column in the source, but it appears that some of pkgcheck's usage of tree-sitter assumes that captured nodes are returned in order.

There doesn't appear to any features in QueryCursor (or other places) that allows one to specify that returned nodes should be ordered in a specific way, so instead we introduce a decorator on QueryCursor that takes dict of captured nodes and sorts each list of nodes by line and column.

@laumann laumann requested a review from arthurzam March 28, 2026 12:12
@thesamesam
Copy link
Copy Markdown
Member

LGTM but please link to the bug in the commit message.

@laumann laumann force-pushed the tree-sitter-sort-captures branch from a7e2b9c to 957d98c Compare March 28, 2026 12:56
@laumann
Copy link
Copy Markdown
Contributor Author

laumann commented Mar 28, 2026

I haven't checked all the call sites for bash.*_query invocations, but the usage I've seen the most is something like this:

bash.cmd_query.captures(...).get("call", ())

these uses should not be negatively impacted by this change.

An alternative (and more conservative) approach is to create parallel variables like bash.cmd_query_sorted and switch only those call sites where we know that the checks depend on nodes being ordered.

EDIT: I also don't have any numbers for how much overhead the node sorting adds.

@laumann laumann force-pushed the tree-sitter-sort-captures branch from 957d98c to ee4f9bd Compare March 28, 2026 14:38
@ferringb
Copy link
Copy Markdown
Contributor

ferringb commented Mar 28, 2026

It sounds bike-sheddy, but I'd drop the sorted_query name. Just name it query, and rename the original to unstable_query- make that particular symbol name explicit that you cannot trust it to have a stable ordering. Make the default case (what people reach for- query) 'work', and leave the unsorted version available for people who know what they're doing and are optimizing. Hell, even if they know what they're doing, it's a likely footgun, hence my suggestion to rename query so the footgun is part of the name.

Either way, that's not a simple thing you ran down ;)

@laumann
Copy link
Copy Markdown
Contributor Author

laumann commented Mar 29, 2026

It sounds bike-sheddy, but I'd drop the sorted_query name. Just name it query, and rename the original to unstable_query- make that particular symbol name explicit that you cannot trust it to have a stable ordering. Make the default case (what people reach for- query) 'work', and leave the unsorted version available for people who know what they're doing and are optimizing. Hell, even if they know what they're doing, it's a likely footgun, hence my suggestion to rename query so the footgun is part of the name.

Sure, I can do that.

Either way, that's not a simple thing you ran down ;)

Was a fun detective hunt ;)

It has been observed that the order of nodes returned by tree-sitter's
QueryCursor is not necessarily in order wrt line and column in the
source, but it appears that some of pkgcheck's usage of tree-sitter
assumes that captured nodes are returned in order.

There doesn't appear to any features in QueryCursor (or other places)
that allows one to specify that returned nodes should be ordered in a
specific way, so instead we introduce a decorator on QueryCursor that
takes dict of captured nodes and sorts each list of nodes by line and
column.

Fixes: #702
Signed-off-by: Thomas Bracht Laumann Jespersen <t@laumann.xyz>
@laumann laumann force-pushed the tree-sitter-sort-captures branch from ee4f9bd to 886b352 Compare March 29, 2026 14:38
@gentoo-bot gentoo-bot closed this Mar 29, 2026
@gentoo-bot gentoo-bot deleted the tree-sitter-sort-captures branch March 29, 2026 16:28
@ferringb
Copy link
Copy Markdown
Contributor

ferringb commented Mar 29, 2026

Ignore the 'unmerged' message; this is merged- thanks.

For future commits, please setup gpg signing- in this case I added my own 'signed-off', so I signed the commit (your authorship is obviously preserved). That resolved the hard requirement that all commits must be gpg signed, but in the future, it's simpler if you have it enabled.

Sans that- I noticed this particular primitive has no unit tests. I'm not sure they would've flushed this out, but do you mind taking a look at adding some basic test assertions? Your fix has no coverage, but what caught my eye is the various utility queries- cmd_query for example- none of them also have direct test coverage.

Downstream tests likely verify this, but if treesitter changes something, it's preferable we catch it at this level, rather than doing the "wtf?" detective thing you had to do ;)

Thanks!

@laumann
Copy link
Copy Markdown
Contributor Author

laumann commented Mar 30, 2026

Thanks for merging, I've set up gpg signing now. I can look at adding some unit tests to the bash module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants