You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The framework supports multi-turn conversations with VLMs, allowing you to conduct dialogues that can involve multiple images.
314
+
315
+
-**Multi-Turn Prompts**: To engage in a conversation, provide multiple prompts sequentially using the `--prompt` argument. Each string will be treated as a separate turn.
316
+
-**Multiple Images**: You can supply multiple images (from URLs or local paths) using the `--image_path` argument.
317
+
-**Flexible Image Placement**: Use the `<image>` token within your prompt to specify exactly where each image's embeddings should be placed. The images provided via `--image_path` will replace the `<image>` tokens in the order they appear.
318
+
319
+
**Example**:
320
+
321
+
In this example, the first turn compares two images, the second turn asks a follow-up question about the first image, and the third turn asks for a caption for a third image.
322
+
323
+
```bash
324
+
# Define image URLs and prompts for a 3-turn conversation
-**Turn 1**: The prompt `"<image><image>Compare these images above and list the differences."` uses the first two images (`$IMAGE1_URL`, `$IMAGE2_URL`).
339
+
-**Turn 2**: The prompt `"Answer the question: What's the main object in first image?"` is a text-only follow-up. The conversation context is maintained from the previous turn.
340
+
-**Turn 3**: The prompt `"<image>Caption this image."` uses the third image (`$IMAGE3_URL`).
0 commit comments