Skip to content

fix(gazelle): replace cgo tree-sitter binding with gotreesitter#3786

Draft
odvcencio wants to merge 3 commits into
bazel-contrib:mainfrom
odvcencio:gotreesitter-cross-compile
Draft

fix(gazelle): replace cgo tree-sitter binding with gotreesitter#3786
odvcencio wants to merge 3 commits into
bazel-contrib:mainfrom
odvcencio:gotreesitter-cross-compile

Conversation

@odvcencio
Copy link
Copy Markdown

@odvcencio odvcencio commented May 17, 2026

Potential fix for #1913.

The Gazelle Python parser currently depends on github.com/smacker/go-tree-sitter, which brings in the C tree-sitter runtime through cgo. That makes the Gazelle binary harder to cross-compile, especially for Darwin targets, even though the parser is only used as an implementation detail of the Python extension.

This replaces that binding with github.com/odvcencio/gotreesitter v0.17.4, a pure-Go tree-sitter-compatible parser/runtime, and updates the Gazelle dependency metadata to use the generated Python grammar from gotreesitter. The parser behavior stays tree-sitter based; the before/after difference is that Gazelle no longer needs the cgo tree-sitter dependency for Python source parsing.

The implementation uses a shared gotreesitter parser pool and releases parsed trees after Gazelle extracts imports/comments/main metadata, so repeated package scans do not retain parser arenas longer than needed.

No CHANGELOG entry is included because this is an implementation dependency change for the Gazelle extension rather than a new user-facing API.

Validation:

  • bazel test //python:python_test_with_nested_import_statements --test_output=all --nocache_test_results
  • bazel test //python:all
  • bazel build --platforms=@io_bazel_rules_go//go/toolchain:darwin_amd64_cgo //python:gazelle_binary
  • bazel build --platforms=@io_bazel_rules_go//go/toolchain:darwin_arm64_cgo //python:gazelle_binary
  • bazel build --platforms=@io_bazel_rules_go//go/toolchain:darwin_amd64 //python:gazelle_binary
  • bazel build --platforms=@io_bazel_rules_go//go/toolchain:linux_amd64 //python:gazelle_binary
  • bazel build --platforms=@io_bazel_rules_go//go/toolchain:windows_amd64 //python:gazelle_binary

Local performance check:

  • On the repo's Gazelle Python testdata corpus, the full FileParser.Parse path is in the same range as the previous cgo binding in local measurements after parser-pool/tree-release fixes.
  • On a largest-file Python stdlib sample, gotreesitter is still slower than cgo, so this remains a draft while larger-file parser/runtime performance is evaluated.

- Replace the `smacker/go-tree-sitter` dependency with the `odvcencio/gotreesitter` fork so the Python parser keeps working with the current API.
- Refresh Bazel and Go module metadata for the new dependency set and its transitive requirements.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request migrates the Gazelle Python parser from smacker/go-tree-sitter to odvcencio/gotreesitter, involving updates to Bazel and Go dependency files and refactoring the parser logic to use the new API. Feedback focuses on improving robustness by adding checks for child node counts and string lengths before access, which prevents potential nil pointer dereferences and out-of-bounds panics when handling malformed source files.

Comment on lines +110 to +111
if child.Type(pythonLanguage) == sitterNodeTypeIfStatement &&
child.Child(1).Type(pythonLanguage) == sitterNodeTypeComparisonOperator && child.Child(1).Child(1).Type(pythonLanguage) == "==" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The parser can return a tree with errors if the Python source is malformed. Accessing child.Child(1) and its children without checking ChildCount() can lead to a nil pointer dereference and panic. Since ParseCode explicitly allows returning a root node with errors, we should be defensive here.

Suggested change
if child.Type(pythonLanguage) == sitterNodeTypeIfStatement &&
child.Child(1).Type(pythonLanguage) == sitterNodeTypeComparisonOperator && child.Child(1).Child(1).Type(pythonLanguage) == "==" {
if child.Type(pythonLanguage) == sitterNodeTypeIfStatement && child.ChildCount() >= 2 &&
child.Child(1).Type(pythonLanguage) == sitterNodeTypeComparisonOperator && child.Child(1).ChildCount() >= 3 && child.Child(1).Child(1).Type(pythonLanguage) == "==" {

Comment on lines +118 to +119
if a.Type(pythonLanguage) == sitterNodeTypeIdentifier && a.Text(p.code) == "__name__" &&
b.Type(pythonLanguage) == sitterNodeTypeString && string(p.code[b.StartByte()+1:b.EndByte()-1]) == "__main__" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

When stripping quotes from a string node, we should ensure the node is at least 2 bytes long (for the quotes) to avoid a slice out-of-bounds panic on malformed or empty string nodes.

Suggested change
if a.Type(pythonLanguage) == sitterNodeTypeIdentifier && a.Text(p.code) == "__name__" &&
b.Type(pythonLanguage) == sitterNodeTypeString && string(p.code[b.StartByte()+1:b.EndByte()-1]) == "__main__" {
if a.Type(pythonLanguage) == sitterNodeTypeIdentifier && a.Text(p.code) == "__name__" &&
b.Type(pythonLanguage) == sitterNodeTypeString && b.EndByte()-b.StartByte() >= 2 && string(p.code[b.StartByte()+1:b.EndByte()-1]) == "__main__" {

Comment on lines +176 to +177
} else if node.Type(pythonLanguage) == sitterNodeTypeImportFromStatement {
from := node.Child(1).Text(p.code)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

For import_from_statement, we should verify that the node has enough children before accessing node.Child(1) to avoid a nil pointer dereference on malformed input.

Suggested change
} else if node.Type(pythonLanguage) == sitterNodeTypeImportFromStatement {
from := node.Child(1).Text(p.code)
} else if node.Type(pythonLanguage) == sitterNodeTypeImportFromStatement && node.ChildCount() >= 2 {
from := node.Child(1).Text(p.code)

odvcencio added 2 commits May 17, 2026 15:52
- Share a parser pool across Python parses instead of creating a new parser for each request
- Reduce allocation and parser setup overhead in the Gazelle Python file parser
- Factor tree parsing into a shared helper so callers can work with the full tree when needed
- Keep `ParseCode` returning the root node, while `FileParser.Parse` now releases the tree after parsing
- Prevent tree-sitter resources from lingering during Gazelle file parsing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant