-
Notifications
You must be signed in to change notification settings - Fork 166
Returned "path" of HTTPReader
and GDriveReader
diverges
#451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree that we should align the two and the default being the file name is likely better than URL. I wonder if we should add an optional argument which can be either:
I think adding 2 to cc: @ejguan |
I am not sure about it. If any url contains directory path, idk this is the idea behavior. For example dp = IterableWrapper(
[
"https://abc.com/folder1/file1.txt",
"https://abc.com/folder2/file1.txt",
]
) Returning filename by default becomes a problem for users as there will be duplicate filenames. I do have a question about the use case of One potential use case might be multi-model. But, for multi-model with different data sources, they also need different pipeline to run pre-processing. Then, IMO, it makes more sense to have a separate |
Another thing is the other related I understand Google Drive is special case because the URLs don't contain any file name or file path. So, in order to have an aligned result from |
I don't think so, no. But that shouldn't be the issue TBH. This is the same argument I'm making in pytorch/vision#6060 (comment) for loading of archives. In both cases as user I'm willing to trade specific control for convenience. Note that I'm not saying that we shouldn't have the individual classes. If one for example only wants to perform plain HTTP requests, they can use the |
Understood about the convenience for users. I am more concern about how to maintain this So, to achieve a common ground, we might need to:
|
HTTPReader
returns the URL for the "path"data/torchdata/datapipes/iter/load/online.py
Line 46 in 13b574c
while
GDriveReader
returns the file namedata/torchdata/datapipes/iter/load/online.py
Line 129 in 13b574c
Since
OnlineReader
determines at runtime whether to call the HTTP or GDrive downloaddata/torchdata/datapipes/iter/load/online.py
Lines 198 to 203 in 13b574c
the "path" of the yielded tuples is impossible to predict:
We should align the two. My vote is out to align based on the file name. Still, returning the URL could also be useful if redirect logic as discussed in pytorch/vision#6060 (review) is added to the
HTTPReader
.The text was updated successfully, but these errors were encountered: