Skip to content

BrokkAi/tree-sitter-ng

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

449 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tree Sitter NG

CI Latest Release

Next generation Tree Sitter Java binding. A "Java-first" fork optimized for modern developer experience, safety, and ecosystem breadth.

Why this fork?

  • Ecosystem Breadth: Expanded support for the modern stack (Kotlin, Zig, Angular, Vue.js), supplementing the official grammars maintained by the upstream project.
  • Modern Java Ergonomics: Moving away from C-style wrappers toward a library that feels native to modern Java, while maintaining a low Java 11+ baseline requirement.

    Note on Building: While the compiled library targets Java 11, building the project from source requires JDK 17+ (due to Gradle 9 requirements).

    • Strict Null Safety: Integration with JSpecify and Error Prone for compile-time safety at the JNI boundary.
    • Idiomatic Patterns: Lazy collection patterns (e.g., getNamedChildren()) and strict handling (e.g., parseStringOrThrow()).
    • Advanced Query Support: First-class support for Tree-sitter predicates and directives (e.g., #eq?, #match?, #set!) directly within the Java API.
    • Resource Management: Automated native memory management using the Cleaner API with AutoCloseable support.

Start hacking!

try (TSParser parser = new TSParser();
     TSLanguage json = new TreeSitterJson()) {

    parser.setLanguage(json);
    
    // Use parseStringOrThrow for strict null handling
    try (TSTree tree = parser.parseStringOrThrow(null, "[1, null]")) {
        TSNode rootNode = tree.getRootNode();
        
        // Access children via index
        TSNode arrayNode = rootNode.getNamedChild(0);

        // Or use the new lazy list pattern for easier iteration
        for (TSNode child : arrayNode.getNamedChildren()) {
            System.out.println(child.getType());
        }
    }
}

Supported Grammars

We maintain both official and high-demand community grammars.

Language Source Support Level
Java, Python, C++, Go, etc. Official Upstream grammars, bundled here
Kotlin, Zig Community Maintained & packaged in this fork
Vue, Angular Community Extended support for web stack

Technical Design

JNI Safety & Memory Management

We bridge the gap between Java's GC and C's manual memory management using a dual-layered approach:

  1. AutoCloseable: Primary resources (Parsers, Trees, Cursors) implement AutoCloseable for deterministic cleanup via try-with-resources.
  2. Cleaner API: A Cleaner fallback ensures that if a Java object is garbage collected without being closed, the underlying native memory is still freed, preventing leaks in long-running processes.

Type Safety

By utilizing JSpecify annotations and Error Prone static analysis, we enforce null-safety across the JNI boundary. This ensures that the "C-heavy" nature of Tree-Sitter doesn't lead to NullPointerException or JVM crashes in your Java application.

Zig Cross-Compilation

We use Zig as our C/C++ compiler toolchain. This allows us to produce perfectly matched native binaries for Linux, macOS, and Windows (x86_64 and aarch64) from a single CI environment without complex cross-compilation headers.

Commands

# Compile Java and native modules
./gradlew compile

# Build and test all subprojects
./gradlew build

# (Re)generate NodeType/NodeField/NodeSchema sources for a language module
# Produces/updates src/main/java/org/treesitter/<Lang>NodeType.java (+ NodeField/NodeSchema) from upstream node-types.json
./gradlew :tree-sitter-tsx:generateNodeTypes

# (Re)generate NodeType/NodeField/NodeSchema sources for all language modules
./gradlew :generateNodeTypes

Node Type & Schema API (NodeType / NodeField / NodeSchema)

Most upstream tree-sitter grammars publish a node-types.json file describing supported node types. This repo can generate and ship:

  • org.treesitter.<Lang>NodeType — named node types (enum)
  • org.treesitter.<Lang>NodeField — field names (enum)
  • org.treesitter.<Lang>NodeSchema — lightweight schema helpers derived from node-types.json

Example (TSX): org.treesitter.TsxNodeType.ABSTRACT_CLASS_DECLARATION.getType().equals("abstract_class_declaration"). You can also do TsxNodeType.from(node) / TsxNodeType.fromType(node.getType()) which return TsxNodeType.__NULL__ for null input (or unknown types).

Field/schema usage (TSX):

Set<TsxNodeField> possibleFields = TsxNodeSchema.fields(TsxNodeType.FUNCTION_DECLARATION);
Set<TsxNodeType> allowedNameTypes = TsxNodeSchema.allowedTypes(TsxNodeType.FUNCTION_DECLARATION, TsxNodeField.NAME);
boolean isNameRequired = TsxNodeSchema.isRequired(TsxNodeType.FUNCTION_DECLARATION, TsxNodeField.NAME);

Set<TsxNodeType> allowedChildTypes = TsxNodeSchema.allowedChildTypes(TsxNodeType.FUNCTION_DECLARATION);

Generation is done via the Gradle task :tree-sitter-<lang>:generateNodeTypes and the generated sources are checked in under each subproject’s src/main/java so they ship in published artifacts without requiring codegen at consumer build time.

Releases

The project distinguishes between the Java library version (libVersion) and the upstream grammar version ( upstreamVersion). We are not currently working on Maven Central publishing. For now, we provide a pre-bundled ZIP to ensure all native binaries are perfectly matched to the library version.

Lockstep Versioning

We use a lockstep versioning strategy for releases. This means that every module in the repository shares the exact same libVersion (e.g., 0.1.0). When a new release is cut, all modules are published with this new version number, regardless of whether their specific parser or upstream grammar changed.

This provides a simple and predictable experience: you only ever need to specify one version number for all ai.brokk:tree-sitter-* dependencies in your build file, and they are guaranteed to be perfectly compatible with each other.

When building or publishing a new release of the Java bindings, specify the libVersion:

# Build with version
./gradlew build -PlibVersion=0.1.0

# Publish with version
./gradlew publish -PlibVersion=0.1.0

The upstreamVersion is managed in each subproject's gradle.properties and controls which version of the native tree-sitter C code is downloaded and compiled.

Note: Native binaries are generated into src/main/resources/lib during the build process and are ignored by Git. They are built automatically in CI and do not need to be committed to the repository.

Features

  • Wide Compatibility: Low Java 11 minimum requirement.
  • 100% Tree Sitter API coverage.
  • Easy to bootstrap cross compiling environments powered by Zig.
  • Built-in official parsers.
  • Load parsers as shared object from disk.

Supported CPUs and OSes

  • x86_64-windows
  • x86_64-macos
  • aarch64-macos
  • x86_64-linux
  • aarch64-linux

Installation

Currently, we distribute perfectly matched native binaries via a pre-bundled ZIP to avoid Git history bloat. For full instructions on how to automate fetching and caching these dependencies via Gradle flatDir, please see our Installation Guide.

Contributing

Want to add a new community grammar? Check out our Guide to Adding Parsers to see how our code-generation task handles the boilerplate.

Built-in Parsers

Name Grammar Version Source
tree-sitter-agda 1.3.3 official
tree-sitter-angular 0.8.3 community
tree-sitter-bash 0.25.1 official
tree-sitter-c 0.24.1 official
tree-sitter-c-sharp 0.23.1 official
tree-sitter-cpp 0.23.4 official
tree-sitter-css 0.25.0 official
tree-sitter-embedded-template 0.25.0 official
tree-sitter-go 0.25.0 official
tree-sitter-haskell 0.23.1 official
tree-sitter-html 0.23.2 official
tree-sitter-java 0.23.5 official
tree-sitter-javascript 0.25.0 official
tree-sitter-json 0.24.8 official
tree-sitter-julia 0.25.0 official
tree-sitter-kotlin 0.3.8 community
tree-sitter-ocaml 0.23.2 official
tree-sitter-php 0.24.2 official
tree-sitter-python 0.25.0 official
tree-sitter-regex 0.25.0 official
tree-sitter-ruby 0.23.1 official
tree-sitter-rust 0.24.0 official
tree-sitter-scala 0.24.0 official
tree-sitter-tsx 0.23.2 official
tree-sitter-typescript 0.23.2 official
tree-sitter-verilog 1.0.3 official
tree-sitter-vue ce8011a4 community
tree-sitter-zig 6479aa13 community

API Tour

class Main {
    public static void main(String[] args) throws Exception {
        String jsonSource = "[1, null]";

        // TSParser, TSLanguage, TSTree, TSQuery, TSQueryCursor, TSTreeCursor implement AutoCloseable.
        // They are also registered in the Cleaner, but explicit closing via try-with-resources is recommended.
        try (TSParser parser = new TSParser();
             TSLanguage json = new TreeSitterJson()) {

            // Set language parser
            parser.setLanguage(json);

            // Parse with string input
            try (TSTree tree = parser.parseString(null, jsonSource)) {
                assert tree != null;

                parser.reset();
                // Or parse with encoding
                try (TSTree tree2 = parser.parseStringEncoding(null, jsonSource, TSInputEncoding.TSInputEncodingUTF8)) {
                    // ...
                }

                parser.reset();
                // Or parse with custom reader
                byte[] buffer = new byte[1024];
                TSReader reader = (buf, offset, position) -> {
                    byte[] sourceBytes = jsonSource.getBytes(StandardCharsets.UTF_8);
                    if (offset >= sourceBytes.length) {
                        return 0;
                    }
                    ByteBuffer byteBuffer = ByteBuffer.wrap(buf);
                    byteBuffer.put(sourceBytes);
                    return sourceBytes.length;
                };
                try (TSTree tree3 = parser.parse(buffer, null, reader, TSInputEncoding.TSInputEncodingUTF8)) {
                    assert tree3 != null;
                }

                // Traverse the AST tree with DOM-like APIs
                TSNode rootNode = tree.getRootNode();

                // Access children as a standard Java List
                List<TSNode> children = rootNode.getChildren();
                TSNode arrayNode = rootNode.getNamedChild(0);

                // Or traverse the AST with cursor
                try (TSTreeCursor rootCursor = new TSTreeCursor(rootNode)) {
                    rootCursor.gotoFirstChild();
                }

                // Or query the AST with S-expression using modern Stream API
                try (TSQuery query = new TSQuery(json, "((document) @root)");
                     TSQueryCursor cursor = new TSQueryCursor()) {
                    cursor.exec(query, rootNode);

                    // Use .stream() for functional patterns. 
                    // Note: use .copy() if you need to collect matches, as the cursor reuses match objects!
                    List<TSQueryMatch> matches = cursor.stream()
                            .map(TSQueryMatch::copy)
                            .collect(Collectors.toList());

                    // Or use the enhanced for-loop (Iterable)
                    for (TSQueryMatch match : cursor) {
                        System.out.println("Pattern index: " + match.getPatternIndex());
                    }
                }

                // Debug the parser with a logger
                TSLogger logger = (type, message) -> {
                    System.out.println(message);
                };
                parser.setLogger(logger);

                // Or output the AST tree as DOT graph
                File dotFile = File.createTempFile("json", ".dot");
                parser.printDotGraphs(dotFile);
            }
        }
    }
}

About

Modern, null-safe Tree-Sitter bindings for Java. Features JSpecify support, bundled cross-compiled binaries via Zig, and expanded community grammars (Kotlin, Vue, Angular).

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors