Skip to content

🐛 Bug: searchIndex.json incorrectly extracts Racket code tokens (e.g., #lang, #t, #f) as tags #740

@jrtxio

Description

@jrtxio

Description

When using the Digital Garden plugin, the generated searchIndex.json incorrectly extracts code content as tags.
This becomes especially problematic with Racket code blocks, because Racket code contains many identifiers that begin with # (e.g., #lang, #t, #f), but these are not Obsidian tags.

As a result, the plugin ends up adding a large number of false tags into the tags array.

Expected Behavior

  • Only tags explicitly written as Obsidian tags (e.g., #Lisp, #Racket) or frontmatter tags should be included.
  • Any # that appears in code, including fenced code blocks, inline code, or code-like JSON, must not be treated as a tag.

Actual Behavior

The plugin scans the entire note and appears to treat any word starting with # as a tag — including valid Racket syntax.

Examples that were incorrectly extracted as tags:

  • #lang
  • #t
  • #f
  • identifiers and keywords inside code blocks

Example of problematic output in searchIndex.json:

{
  "title": "Lexical Closure Guide",
  "tags": [
    "Lisp",
    "Racket",
    "lang",
    "t",
    "f",
    "note",
  ]
}

All of the above code-related items should not appear as tags.

Reproduction Steps

  1. Create a note containing Racket code blocks such as:

    #lang racket
    
    (define x #t)
    (define y #f)
  2. Publish using the Digital Garden plugin.

  3. Open the generated searchIndex.json.

  4. #lang, #t, #f, and other code identifiers appear inside the tags list.

Why This Happens (Likely Cause)

The plugin seems to extract tags by scanning for # patterns in plain text, without:

  • respecting fenced code blocks (racket … )
  • ignoring inline code (#t)
  • distinguishing between Obsidian tags and language syntax

Because Racket uses # heavily in its language syntax, simple text scanning yields many false positives.

Environment

  • Obsidian: 1.10.6
  • Digital Garden plugin: 2.64.1
  • OS: Windows 11

Request

Could you please improve the tag extraction logic so that code blocks and language tokens (especially Racket’s #lang, #t, #f) are not treated as tags?

This issue makes the search index noisy for anyone writing about languages with leading-# syntax (Racket, Lisp, Scheme, etc.).

Thanks for maintaining this plugin — it’s extremely helpful for the community!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions