-
Notifications
You must be signed in to change notification settings - Fork 7
feat: enhance chat completion with video/file URLs, multiple models, … #72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…and simplify output - Add support for video URL(s) in chat completion - Add support for file URL(s) in chat completion (PDFs, documents, etc.) - Add three model options: vlmrun-orion-1:fast, vlmrun-orion-1:auto (default), vlmrun-orion-1:pro - Add "Simplify Output" toggle to return OpenAI-compatible format - Update ChatMessage type to support video_url and file_url content types - Remove max tokens and response format fields from UI - Improve input type selection with separate fields for each media type
Summary of ChangesHello @whereisfarukk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the chat completion capabilities by introducing multi-modal input support, allowing users to include video and file URLs in their prompts. It also provides more granular control over the models used for chat completions and offers an option to simplify the output format for better compatibility. These changes aim to make the chat completion feature more versatile and user-friendly. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly enhances the VLM Run node by adding chat completion capabilities with support for multiple models and various media types like video and file URLs. The code is well-structured, but there are a few areas with code duplication that could be refactored for better maintainability. Specifically, the logic for processing different URL types and constructing message parts in VlmRun.node.ts is repeated. Similarly, in VlmRunClient.ts, the makeRequest and makeAgentRequest methods are nearly identical and could be merged. I've left specific suggestions on how to address these points.
I am having trouble creating individual review comments. Click here to see my feedback.
nodes/VlmRun/VlmRun.node.ts (685-733)
The logic for processing imageUrls, videoUrls, and fileUrls is repeated three times. This can be extracted into a helper function to improve maintainability and reduce code duplication.
const extractUrls = (paramName: string): string[] => {
const urls: string[] = [];
const urlsParam = this.getNodeParameter(paramName, i) as IDataObject;
if (urlsParam && urlsParam.url) {
const urlEntries = Array.isArray(urlsParam.url) ? urlsParam.url : [urlsParam.url];
for (const entry of urlEntries) {
if (entry && typeof entry === 'object' && entry.url) {
const url = entry.url as string;
if (url && url.trim()) {
urls.push(url.trim());
}
}
}
}
return urls;
};
const imageUrls = inputType === 'image' ? extractUrls('imageUrls') : [];
const videoUrls = inputType === 'video' ? extractUrls('videoUrls') : [];
const fileUrls = inputType === 'file' ? extractUrls('fileUrls') : [];nodes/VlmRun/VlmRun.node.ts (759-787)
The logic for adding image, video, and file URLs to contentParts is repetitive. This can be refactored into a more generic helper function to improve readability and maintainability.
const addUrlsToContent = (
urls: string[],
type: 'image_url' | 'video_url' | 'file_url',
) => {
for (const url of urls) {
contentParts.push({
type,
[`${type}`]: { url },
});
}
};
addUrlsToContent(imageUrls, 'image_url');
addUrlsToContent(videoUrls, 'video_url');
addUrlsToContent(fileUrls, 'file_url');nodes/VlmRun/VlmRunClient.ts (154-162)
This detailed error logging is very helpful for debugging. However, the makeAgentRequest method does not have similar logging, making the error handling inconsistent across the client. It would be beneficial to apply this enhanced logging to makeAgentRequest as well. Consolidating the two request methods, as suggested in another comment, would be an effective way to ensure consistency.
nodes/VlmRun/VlmRunClient.ts (179-181)
This headers object is also defined in the makeRequest method. The makeAgentRequest and makeRequest methods are very similar and could be consolidated into a single private method to reduce code duplication. A single method could take the base URL as a parameter, which would also help in making the error handling consistent.
…and simplify output