Improve reference metadata handling for EventSource#2648
Conversation
The recent addition of attaching the reference metadata of input files to the provenance information implemented only that the reference metadata is read directly by ctapipe from the input file. This made it necessary to either support all possible input file types or impossible for plugin event sources to provide this metadata on their own. It also makes the assumption 1 EventSource = 1 input file. This is not true for some event sources. This issue is solved by: * Adding the possibility to directly provide the reference metadata to `add_input_file` * Move the responsibility of calling `add_input_file` from the EventSource baseclass to the implementation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Looks like a good idea. Probably need some tests, particularly to ensure that anyone implementing an EventSource plugin doesn't forget to add the input file to the provenance. |
MMh. The only way we could enforce this I think is to add an abstract method to the However, I want to support event sources that have multiple input files, as I know those exist (and we'll have e.g. parallel zfits streams also for the ACADA data). So the API should be something like this: But the issue here is that I know that some event source implementations only open files one-by-one. So I think the solution above is not right. It's really the event sources that have to internally call I don't think we can have a unit test here in ctapipe to enforce this. |
Added explicit tests now for the two |
Analysis Details3 IssuesCoverage and DuplicationsProject ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs |
eebb3d7 to
4c2aafe
Compare
The reference metadata was designed specifically to be an output, not an input, so this may not be fully necessary to go that deep here. All we need to get is the info that we will eventually need to write out, so e.g. the full list of input product_ids is not necessary. The back-links to that info is what would be done in a provenance database derived from the provenance logs. So I think the issue with multiple inputs in the first element in the processing chain is: how to assign a |
|
I am surprised to hear this... in the original issue #2571 (comment) and the implementation #2598 you supported the idea of getting the product ids for inputs |
I think I didn't explain well: I do support this ( The other issue is the difference between "Local Provenance" (inputs and outputs of an Activity) and" TLDR: we do need to read ReferencemMetadata of the inputs, as it is necessary for provenance. It is not propegated to the output ReferenceMetadata, however. To solve places where we have no product_id, we could just suggest that the EventSources put in the obs_id or something like that (I don't really like mixing them though, so maybe we need to think a bit on the model). In any case, this PR is good. |
That's how this works here. It is attaching the reference metadata of input files to the provenance, not to the output reference meta. |
ca55312 to
0981806
Compare
The recent addition of attaching the reference metadata of input files to the provenance information implemented only that the reference metadata is read directly by ctapipe from the input file. This made it necessary to either support all possible input file types
or impossible for plugin event sources to provide this metadata on their own. It also makes the assumption 1 EventSource = 1 input file. This is not true for some event sources.
This issue is solved by:
add_input_fileadd_input_filefrom the EventSource baseclass to the implementationThis is an alternative implementation to #2644