You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following the discussion today, I did some changes to the idea in #42 and provide an implementation instead of a document to show my point. The idea is:
Each resource is identified by the URI scheme. file/null schemes are reserved for default, which is java.nio.Path. This has the following advantages:
Non-path resources can include a Path handled by java.nio.Path. For example, an use case is the GenomicDB, that could be specified as gendb:hdfs://${path} or gendb:${local_path} or SRA accessions/files.
Supports windows paths, which cannot be converted to URI
Out-of-the-box support for installed FileSystemProviders
Each resource has the responsibility of define how they read/write the data. This is designed as a low-level abstraction to hide the implementation, and most of the library users will not require to cast the interfaces.
Resources should be retrieved ONLY through the IOResourceFactory get methods. Registering of IOResourceProvider could be done by either the service API or directly in the factory. Default implementation (java.nio.Path) is not exposed, and could not be overwriten. This allows to fallback to the java.nio.Path in every case, and to be flexible enough to get users define their own providers.
A custom implementation of an IOResource might look as this:
And a consumer will use it in this way (too many assumptions in this case, because we don't have any idea on how we will implement the high-level record interfaces, but the reader/writer will take care if the IOResource - scheme and/or input/output):
One observation: there isn't actually that much overlap between this PR and mine (though there is some), and in some ways, I think the two branches complement each other.
Much of my PR is about taking a raw, unstructured input string provided by a user, and turning it into a structured (PathSpecifier/URI) object that is always guaranteed to have a valid scheme, and which can subsequently be used to locate a "reader" that can handle that scheme. The reader resolution service in progress in my other branch(not yet a PR) looks remarkably like the one in this branch, except that the providers are at a higher level than IOResource is here; they're more like a SAMReader. There is a registry service that, given a PathSpecifier, queries registered providers to find one that claims to be able render records (of whatever type is being requested) from that PathSpecifier. The winner is then instantiated using the same PathSpecifier.
Having said all that:
Having something more strongly typed than "string", that maintains the invariant that there is always a scheme, as the "identifier" currency is super useful. It enables finding a matching plugin; finding sibling files, etc.
We should try to preserve the "raw input string -> URI" capability present in the other PR; there are a lot of various cases that make this challenging (see all the test cases in my branch), and we should provide a single, canonical way to do that transformation.
The actual "readers" (by which I mean something like a SAMReader) will need to know the originating identifier/source, so they can use it for error reporting, sibling resolution, etc. I also imagined they would need some service that can be used to turn a PathSpecifier, which is an identifier, into something more concrete, such as a stream. The "IOResourceProvider" service in this branch is basically that middle layer. So I think it should be possible to reconcile these two branches.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Following the discussion today, I did some changes to the idea in #42 and provide an implementation instead of a document to show my point. The idea is:
file/nullschemes are reserved for default, which isjava.nio.Path. This has the following advantages:java.nio.Path. For example, an use case is the GenomicDB, that could be specified asgendb:hdfs://${path}orgendb:${local_path}or SRA accessions/files.FileSystemProvidersIOResourceFactoryget methods. Registering ofIOResourceProvidercould be done by either the service API or directly in the factory. Default implementation (java.nio.Path) is not exposed, and could not be overwriten. This allows to fallback to thejava.nio.Pathin every case, and to be flexible enough to get users define their own providers.A custom implementation of an
IOResourcemight look as this:And a consumer will use it in this way (too many assumptions in this case, because we don't have any idea on how we will implement the high-level record interfaces, but the reader/writer will take care if the
IOResource- scheme and/or input/output):This is the alternative that I have in mind instead of #34.