mtmd : add methods to access `mtmd_image_tokens` #12906

ngxson · 2025-04-11T20:20:50Z

Part of this PR is extracted from #12898 , for easier review

Note: we're doing hash of the post-processed image (f32 image), but probably it's better to clip_image_preprocess doesn't expose the internal downscaled image (u8 image) - this can be implemented in a future version of clip.cpp

examples/llava/mtmd.cpp

ggerganov · 2025-04-13T19:20:17Z

examples/llava/mtmd.cpp

@@ -123,14 +130,14 @@ mtmd_input_chunks * mtmd_tokenize(mtmd_context * ctx,
            std::move(tokens),
            {},
        };
-        output->emplace_back(std::move(chunk));
+        output.emplace_back(std::move(chunk));

        if (&parts.back() != &part) {


Not 100% sure, but I think this logic does not handle the case where the text ends with an image marker:

<some_text><image_marker>

For Gemma3 this will not happen because we wrap the image marker with text on both sides, but maybe for other models it could happen? If it cannot happen for sure, then this if should become assert.

The current logic will produce an empty text chunk in the end if the image marker is placed at the end on the input prompt. This is an effect from string_split_str

For example, this code:

auto test = string_split_str("123aa456aa", "aa"); for (auto & p : test) printf("'%s'\n", p.c_str());

Will output:

'123' '456' ''

~~I think having an empty chunk in the end is expected for now, but I should document it better.~~

~~If we don't want this empty chunk, the proper way is to stop using string_split_str and to write our own code to do string matching / splitting.~~

In reality, this will almost never happen because user always input a prompt with a generation prefix, something like <s>user\nwhat do you see?<image></s><s>assistant\n

Sorry I missed one line of code:

auto tokens = mtmd_tokenize_text_internal(...); if (tokens.empty()) { continue; }

So that means there is no empty chunk being added, the case where image marker placed in the end is correctly handled. This also handles the case where 2 image markers are place one next to the other.

(This piece of code was firstly introduced from my first attempt to refactor vision API, so yeah it's quite hacky)

I think I will refactor this function later on. Could you review the rest of this PR? Thanks!!

* mtmd : add more api around mtmd_image_tokens * mtmd : ability to calc image hash * shared_ptr for mtmd_image_tokens * move hash to user-define ID (fixed) * fix prompt_modified * rm redundant data member

ngxson added 2 commits April 11, 2025 21:49

mtmd : add more api around mtmd_image_tokens

a46b6db

mtmd : ability to calc image hash

7ac0b7b

ngxson requested review from ggerganov and slaren April 11, 2025 20:20

github-actions bot added the examples label Apr 11, 2025

slaren reviewed Apr 12, 2025

View reviewed changes

examples/llava/mtmd.cpp Outdated Show resolved Hide resolved

examples/llava/mtmd.cpp Outdated Show resolved Hide resolved

shared_ptr for mtmd_image_tokens

58c4767

ngxson force-pushed the xsn/mtmd_image_api branch from bfbabea to c3587e2 Compare April 12, 2025 08:56

move hash to user-define ID (fixed)

d3c3e20

ngxson force-pushed the xsn/mtmd_image_api branch from c3587e2 to d3c3e20 Compare April 12, 2025 09:03

ngxson changed the title ~~mtmd : add methods to access mtmd_image_tokens, add ability to get image hash~~ mtmd : add methods to access mtmd_image_tokens Apr 12, 2025

ggerganov reviewed Apr 13, 2025

View reviewed changes

ngxson added 2 commits April 13, 2025 23:39

fix prompt_modified

cd5dc6b

rm redundant data member

dbb257c

ggerganov approved these changes Apr 17, 2025

View reviewed changes

ngxson merged commit b9154ec into ggml-org:master Apr 18, 2025
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd : add methods to access `mtmd_image_tokens` #12906

mtmd : add methods to access `mtmd_image_tokens` #12906

ngxson commented Apr 11, 2025

ggerganov Apr 13, 2025 •

edited

Loading

ngxson Apr 13, 2025 •

edited

Loading

ngxson Apr 13, 2025 •

edited

Loading

ngxson Apr 14, 2025

mtmd : add methods to access mtmd_image_tokens #12906

mtmd : add methods to access mtmd_image_tokens #12906

Conversation

ngxson commented Apr 11, 2025

ggerganov Apr 13, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 13, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 13, 2025 • edited Loading

Choose a reason for hiding this comment

ngxson Apr 14, 2025

Choose a reason for hiding this comment

mtmd : add methods to access `mtmd_image_tokens` #12906

mtmd : add methods to access `mtmd_image_tokens` #12906

ggerganov Apr 13, 2025 •

edited

Loading

ngxson Apr 13, 2025 •

edited

Loading

ngxson Apr 13, 2025 •

edited

Loading