fix(gazelle): replace cgo tree-sitter binding with gotreesitter#3786
fix(gazelle): replace cgo tree-sitter binding with gotreesitter#3786odvcencio wants to merge 3 commits into
Conversation
- Replace the `smacker/go-tree-sitter` dependency with the `odvcencio/gotreesitter` fork so the Python parser keeps working with the current API. - Refresh Bazel and Go module metadata for the new dependency set and its transitive requirements.
There was a problem hiding this comment.
Code Review
This pull request migrates the Gazelle Python parser from smacker/go-tree-sitter to odvcencio/gotreesitter, involving updates to Bazel and Go dependency files and refactoring the parser logic to use the new API. Feedback focuses on improving robustness by adding checks for child node counts and string lengths before access, which prevents potential nil pointer dereferences and out-of-bounds panics when handling malformed source files.
| if child.Type(pythonLanguage) == sitterNodeTypeIfStatement && | ||
| child.Child(1).Type(pythonLanguage) == sitterNodeTypeComparisonOperator && child.Child(1).Child(1).Type(pythonLanguage) == "==" { |
There was a problem hiding this comment.
The parser can return a tree with errors if the Python source is malformed. Accessing child.Child(1) and its children without checking ChildCount() can lead to a nil pointer dereference and panic. Since ParseCode explicitly allows returning a root node with errors, we should be defensive here.
| if child.Type(pythonLanguage) == sitterNodeTypeIfStatement && | |
| child.Child(1).Type(pythonLanguage) == sitterNodeTypeComparisonOperator && child.Child(1).Child(1).Type(pythonLanguage) == "==" { | |
| if child.Type(pythonLanguage) == sitterNodeTypeIfStatement && child.ChildCount() >= 2 && | |
| child.Child(1).Type(pythonLanguage) == sitterNodeTypeComparisonOperator && child.Child(1).ChildCount() >= 3 && child.Child(1).Child(1).Type(pythonLanguage) == "==" { |
| if a.Type(pythonLanguage) == sitterNodeTypeIdentifier && a.Text(p.code) == "__name__" && | ||
| b.Type(pythonLanguage) == sitterNodeTypeString && string(p.code[b.StartByte()+1:b.EndByte()-1]) == "__main__" { |
There was a problem hiding this comment.
When stripping quotes from a string node, we should ensure the node is at least 2 bytes long (for the quotes) to avoid a slice out-of-bounds panic on malformed or empty string nodes.
| if a.Type(pythonLanguage) == sitterNodeTypeIdentifier && a.Text(p.code) == "__name__" && | |
| b.Type(pythonLanguage) == sitterNodeTypeString && string(p.code[b.StartByte()+1:b.EndByte()-1]) == "__main__" { | |
| if a.Type(pythonLanguage) == sitterNodeTypeIdentifier && a.Text(p.code) == "__name__" && | |
| b.Type(pythonLanguage) == sitterNodeTypeString && b.EndByte()-b.StartByte() >= 2 && string(p.code[b.StartByte()+1:b.EndByte()-1]) == "__main__" { |
| } else if node.Type(pythonLanguage) == sitterNodeTypeImportFromStatement { | ||
| from := node.Child(1).Text(p.code) |
There was a problem hiding this comment.
For import_from_statement, we should verify that the node has enough children before accessing node.Child(1) to avoid a nil pointer dereference on malformed input.
| } else if node.Type(pythonLanguage) == sitterNodeTypeImportFromStatement { | |
| from := node.Child(1).Text(p.code) | |
| } else if node.Type(pythonLanguage) == sitterNodeTypeImportFromStatement && node.ChildCount() >= 2 { | |
| from := node.Child(1).Text(p.code) |
- Share a parser pool across Python parses instead of creating a new parser for each request - Reduce allocation and parser setup overhead in the Gazelle Python file parser
- Factor tree parsing into a shared helper so callers can work with the full tree when needed - Keep `ParseCode` returning the root node, while `FileParser.Parse` now releases the tree after parsing - Prevent tree-sitter resources from lingering during Gazelle file parsing
Potential fix for #1913.
The Gazelle Python parser currently depends on
github.com/smacker/go-tree-sitter, which brings in the C tree-sitter runtime through cgo. That makes the Gazelle binary harder to cross-compile, especially for Darwin targets, even though the parser is only used as an implementation detail of the Python extension.This replaces that binding with
github.com/odvcencio/gotreesitterv0.17.4, a pure-Go tree-sitter-compatible parser/runtime, and updates the Gazelle dependency metadata to use the generated Python grammar from gotreesitter. The parser behavior stays tree-sitter based; the before/after difference is that Gazelle no longer needs the cgo tree-sitter dependency for Python source parsing.The implementation uses a shared gotreesitter parser pool and releases parsed trees after Gazelle extracts imports/comments/main metadata, so repeated package scans do not retain parser arenas longer than needed.
No CHANGELOG entry is included because this is an implementation dependency change for the Gazelle extension rather than a new user-facing API.
Validation:
bazel test //python:python_test_with_nested_import_statements --test_output=all --nocache_test_resultsbazel test //python:allbazel build --platforms=@io_bazel_rules_go//go/toolchain:darwin_amd64_cgo //python:gazelle_binarybazel build --platforms=@io_bazel_rules_go//go/toolchain:darwin_arm64_cgo //python:gazelle_binarybazel build --platforms=@io_bazel_rules_go//go/toolchain:darwin_amd64 //python:gazelle_binarybazel build --platforms=@io_bazel_rules_go//go/toolchain:linux_amd64 //python:gazelle_binarybazel build --platforms=@io_bazel_rules_go//go/toolchain:windows_amd64 //python:gazelle_binaryLocal performance check:
FileParser.Parsepath is in the same range as the previous cgo binding in local measurements after parser-pool/tree-release fixes.