-
Notifications
You must be signed in to change notification settings - Fork 594
Description
Describe the bug
The ParseMessage function in genkit/go/ai/format_json.go fails to parse JSON content when the message text contain other markdown code blocks (such as ```yaml, ```bash, etc.). The function uses base.ExtractJSONFromMarkdown() with a regex pattern that incorrectly matches any code block, not just JSON code blocks, leading to attempts to parse YAML/other content as JSON.
Full error message:
error: model failed to generate output matching expected schema: data is not valid JSON: invalid character 'y' looking for beginning of value
To Reproduce
Steps to reproduce the behavior:
- Create a Message with Content containing markdown code blocks (e.g., ```yaml)
- Call
ParseMessageon ajsonHandlerinstance - The function calls
base.ExtractJSONFromMarkdown()which incorrectly extracts YAML content due to the faulty regex - The YAML content gets passed to JSON validation, causing the error
Code sample that triggers the bug:
message := &Message{
Content: []*Part{
{Text: `{"status": "ok", "config": "```yaml\nkey: value\n```"}`},
},
}
handler := &jsonHandler{...}
_, err := handler.ParseMessage(message) // This will fail with "invalid character 'y'"The issue occurs because the message contains a ```yaml code block. The original regex ```(json)?((\n|.)*?)``` incorrectly matches the ```yaml block and tries to parse the YAML content ("key: value") as JSON, causing the "invalid character 'y'" error.
Expected behavior
When a message contains markdown code blocks content, the function ParseMessage should run correctly.
Root Cause Analysis
The issue is in the ExtractJSONFromMarkdown function in go/internal/base/json.go. The problematic regex pattern:
var jsonMarkdownRegex = regexp.MustCompile("```(json)?((\n|.)*?)```")The (json)? part makes the "json" identifier optional, which means the regex will match ANY code block (```yaml, ```bash, ```python, etc.), not just ```json blocks. When it encounters a ```yaml block, it extracts the YAML content and attempts to parse it as JSON, resulting in the error "invalid character 'y' looking for beginning of value" (where 'y' is from "yaml" content).
Runtime:
- OS: macOS (Darwin Kernel Version 24.6.0)
- Version: macOS Sequoia
Go version
go version go1.25.0 darwin/arm64
Files Affected
go/internal/base/json.go(lines 122-132) - Contains the buggyExtractJSONFromMarkdownfunctiongo/ai/format_json.go(line 94) - CallsExtractJSONFromMarkdowninParseMessagego/ai/generate.go(line 742) - CallsExtractJSONFromMarkdowninModelResponse.Outputgo/internal/base/json.go(line 137) - Internal usage inGetJsonObjectLinesfunction
Applied Fix
I introduced a simple but unverified fix.
-var jsonMarkdownRegex = regexp.MustCompile("```(json)?((\n|.)*?)```")
+var jsonMarkdownRegex = regexp.MustCompile("```json((?s:.*?))?```")
func ExtractJSONFromMarkdown(md string) string {
// TODO: improve this
matches := jsonMarkdownRegex.FindStringSubmatch(md)
if matches == nil {
return md
}
- return matches[2]
+ return matches[1]
}This ensures that only ```json code blocks are matched and processed, preventing the parsing of YAML or other non-JSON code blocks as JSON content, but this can't handle Message with ``` json. I think it’s better to use another way to mark PartJson instead of markdown.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status