Skip to content

Conversation

@Aman071106
Copy link

Title

feat(core): add approximate image token counting support

Description

This PR addresses a TODO in langchain_core/messages/utils.py to support approximate token counting for image blocks in messages.

Issue Addressed

Previously, count_tokens_approximately would fall back to repr(content) for any non-string content (including lists of blocks). For image_url blocks, this resulted in drastically undercounting tokens (only counting the URL characters), typically around ~30-40 tokens instead of the actual cost (~85) of an image.

Changes

  • Updated count_tokens_approximately in libs/core/langchain_core/messages/utils.py to iterate through content blocks.
  • Added handling for type: image_url blocks, assigning a constant cost of 85 tokens (standard low-detail approximation).
  • Maintained fallback behavior for unknown dictionary blocks to ensure no regression for generic JSON content.
  • Added test_count_tokens_approximately_with_image to tests/unit_tests/messages/test_utils.py to verify the fix.

Verification

  • Verified with reproduction script: Token count for a standard image message increased from ~36 to 94.
  • Ran existing unit tests: uv run --group test pytest tests/unit_tests/messages/test_utils.py
  • All 91 tests passed (including regression tests for generic list content).

@Aman071106 Aman071106 requested a review from eyurtsev as a code owner December 29, 2025 18:06
@github-actions github-actions bot added the core `langchain-core` package issues & PRs label Dec 29, 2025
@codspeed-hq
Copy link

codspeed-hq bot commented Dec 29, 2025

CodSpeed Performance Report

Merging #34529 will not alter performance

Comparing Aman071106:feat/core-add-approx-image-token-counting (c32baf2) with master (03ae397)1

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

Summary

✅ 13 untouched
⏩ 21 skipped2

Footnotes

  1. No successful run was found on master (9ecf636) during the generation of this report, so 03ae397 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

  2. 21 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@Aman071106 Aman071106 changed the title Feat(core): add approximate image token counting support feat(core): add approximate image token counting support Dec 29, 2025
@github-actions github-actions bot added the feature For PRs that implement a new feature; NOT A FEATURE REQUEST label Dec 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core `langchain-core` package issues & PRs feature For PRs that implement a new feature; NOT A FEATURE REQUEST

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants