How to deal with 20 million + files #10450
Unanswered
alita-moore
asked this question in
Help
Replies: 1 comment 7 replies
-
Hi @alita-moore! Thanks for reaching out! Yes. DVC can be slow for that scale of dataset. We have a new tool that we've been building for those purposes. We will be releasing it June 25th. You can learn more in this recent talk our CEO, @dmpetrov gave at OSS4AI (at the 1:02 mark). The tool will work processing images, text, video, audio data at scale for computer vision, LLM, or Multimodal applications. More info can be found at https://dvc.ai and if you'd like to talk about your use case and see a demo, you can book a meeting here: https://calendly.com/dmitry-at-iterative/dmitry-petrov-30-minutes |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to use dvc to manage 20 million + small files, but I think it's pretty slow when dealing with many files. Is there a common way of handling cases like these such as using an intermediate zip file or something to that effect? Is 20 million beyond the scope / abuse of the tool? should I use a different tool instead?
Beta Was this translation helpful? Give feedback.
All reactions