Skip to content

Conversation

@whereisfarukk
Copy link
Collaborator

…and simplify output

  • Add support for video URL(s) in chat completion
  • Add support for file URL(s) in chat completion (PDFs, documents, etc.)
  • Add three model options: vlmrun-orion-1:fast, vlmrun-orion-1:auto (default), vlmrun-orion-1:pro
  • Add "Simplify Output" toggle to return OpenAI-compatible format
  • Update ChatMessage type to support video_url and file_url content types
  • Remove max tokens and response format fields from UI
  • Improve input type selection with separate fields for each media type

…and simplify output

- Add support for video URL(s) in chat completion
- Add support for file URL(s) in chat completion (PDFs, documents, etc.)
- Add three model options: vlmrun-orion-1:fast, vlmrun-orion-1:auto (default), vlmrun-orion-1:pro
- Add "Simplify Output" toggle to return OpenAI-compatible format
- Update ChatMessage type to support video_url and file_url content types
- Remove max tokens and response format fields from UI
- Improve input type selection with separate fields for each media type
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @whereisfarukk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the chat completion capabilities by introducing multi-modal input support, allowing users to include video and file URLs in their prompts. It also provides more granular control over the models used for chat completions and offers an option to simplify the output format for better compatibility. These changes aim to make the chat completion feature more versatile and user-friendly.

Highlights

  • Multi-modal Chat Completion: Added support for including video and file URLs (PDFs, documents) directly within chat completion requests, alongside existing image URL support.
  • Flexible Model Selection: Introduced three new model options (vlmrun-orion-1:fast, auto, pro) for chat completions, with auto as the default.
  • Simplified Output Option: A new toggle allows users to receive chat completion responses in an OpenAI-compatible simplified format.
  • Enhanced UI for Input Types: Improved the user interface for selecting and providing different media input types (image, video, file URLs) for chat completions.
  • API Client Enhancements: The API client now includes a client identifier header and improved error logging for better debugging and tracking.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the VLM Run node by adding chat completion capabilities with support for multiple models and various media types like video and file URLs. The code is well-structured, but there are a few areas with code duplication that could be refactored for better maintainability. Specifically, the logic for processing different URL types and constructing message parts in VlmRun.node.ts is repeated. Similarly, in VlmRunClient.ts, the makeRequest and makeAgentRequest methods are nearly identical and could be merged. I've left specific suggestions on how to address these points.

I am having trouble creating individual review comments. Click here to see my feedback.

nodes/VlmRun/VlmRun.node.ts (685-733)

medium

The logic for processing imageUrls, videoUrls, and fileUrls is repeated three times. This can be extracted into a helper function to improve maintainability and reduce code duplication.

const extractUrls = (paramName: string): string[] => {
	const urls: string[] = [];
	const urlsParam = this.getNodeParameter(paramName, i) as IDataObject;
	if (urlsParam && urlsParam.url) {
		const urlEntries = Array.isArray(urlsParam.url) ? urlsParam.url : [urlsParam.url];
		for (const entry of urlEntries) {
			if (entry && typeof entry === 'object' && entry.url) {
				const url = entry.url as string;
				if (url && url.trim()) {
					urls.push(url.trim());
				}
			}
		}
	}
	return urls;
};

const imageUrls = inputType === 'image' ? extractUrls('imageUrls') : [];
const videoUrls = inputType === 'video' ? extractUrls('videoUrls') : [];
const fileUrls = inputType === 'file' ? extractUrls('fileUrls') : [];

nodes/VlmRun/VlmRun.node.ts (759-787)

medium

The logic for adding image, video, and file URLs to contentParts is repetitive. This can be refactored into a more generic helper function to improve readability and maintainability.

const addUrlsToContent = (
	urls: string[],
	type: 'image_url' | 'video_url' | 'file_url',
) => {
	for (const url of urls) {
		contentParts.push({
			type,
			[`${type}`]: { url },
		});
	}
};

addUrlsToContent(imageUrls, 'image_url');
addUrlsToContent(videoUrls, 'video_url');
addUrlsToContent(fileUrls, 'file_url');

nodes/VlmRun/VlmRunClient.ts (154-162)

medium

This detailed error logging is very helpful for debugging. However, the makeAgentRequest method does not have similar logging, making the error handling inconsistent across the client. It would be beneficial to apply this enhanced logging to makeAgentRequest as well. Consolidating the two request methods, as suggested in another comment, would be an effective way to ensure consistency.

nodes/VlmRun/VlmRunClient.ts (179-181)

medium

This headers object is also defined in the makeRequest method. The makeAgentRequest and makeRequest methods are very similar and could be consolidated into a single private method to reduce code duplication. A single method could take the base URL as a parameter, which would also help in making the error handling consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant