Skip to content

Stats calc tool#2628

Merged
maxnoe merged 19 commits into
mainfrom
stats_calc_tool
Jan 29, 2025
Merged

Stats calc tool#2628
maxnoe merged 19 commits into
mainfrom
stats_calc_tool

Conversation

@TjarkMiener
Copy link
Copy Markdown
Member

@TjarkMiener TjarkMiener commented Oct 28, 2024

This PR adds a generic stats-calculation tool utilizing the PixelStatisticsCalculator.

Related #2542

@TjarkMiener TjarkMiener added the module:calib issues related to ctapipe.calib label Oct 28, 2024
@TjarkMiener TjarkMiener self-assigned this Oct 28, 2024
@ctao-dpps-sonarqube

This comment has been minimized.

2 similar comments
@ctao-dpps-sonarqube

This comment has been minimized.

@ctao-dpps-sonarqube

This comment has been minimized.

Comment thread pyproject.toml Outdated
Comment thread src/ctapipe/resources/stats_calc_config.yaml Outdated
Comment thread src/ctapipe/tools/stats_calculation.py Outdated
),
).tag(config=True)

dl1a_column_name = CaselessStrEnum(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is DL1a/b is "official"? Also, I'd perhaps use generic input_column_name similar to the output one.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, in ctapipe we use DL1_IMAGES and DL1_PARAMETERS to distinguish between things that are per-pixel vs. single quantities per event.

https://ctapipe.readthedocs.io/en/latest/api/ctapipe.io.DataLevel.html

Copy link
Copy Markdown
Member

@maxnoe maxnoe Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also not make this an enum. In the generic tool, users should be able to chose any column that has compatible shape. Just provide a clear error when the column is not found in the input file.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a list of columns, to compute on multiple at the same time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed the column name and also polish the references to DL1a data by using pixel-wise image data which is more descriptive. ToolConfigurationError is raised once the column is not found. Having list of columns seems a little bit of an overkill here, which would just make the code more complex. Maybe the aggregation config could be shared between the columns, but especially the outlier detection will be different between the columns.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToolConfigurationError is raised once the column is not found. Having list of columns seems a little bit of an overkill here, which would just make the code more complex. Maybe the aggregation config could be shared between the columns, but especially the outlier detection will be different between the columns.

I think the case where you only want to know about a single column is quite rare, you are usually interested in multiple. So having to read all data again to compute metrics on a new column seems very limiting and a loop over columns shouldn't make the code much more complex.

Comment thread src/ctapipe/resources/stats_calc_config.yaml Outdated
Comment thread src/ctapipe/tools/stats_calculation.py
Comment thread src/ctapipe/resources/stats_calc_config.yaml Outdated
Comment thread src/ctapipe/tools/stats_calculation.py Outdated
@ctao-dpps-sonarqube

This comment has been minimized.

@ctao-dpps-sonarqube

This comment has been minimized.

mexanick
mexanick previously approved these changes Oct 28, 2024
Copy link
Copy Markdown
Member

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently fails to read prod3 files (which have no EFFECTIVE focal length information). The tool current fails with a focal_length_choice exception, however it seems there is no way to set the focal length choioce since the TableLoader is not set up to be configrable.

Comment thread src/ctapipe/tools/calculate_pixel_stats.py Outdated
Comment thread src/ctapipe/tools/calculate_pixel_stats.py Outdated
Comment thread src/ctapipe/tools/calculate_pixel_stats.py Outdated
Comment thread src/ctapipe/tools/calculate_pixel_stats.py Outdated
Comment thread src/ctapipe/tools/calculate_pixel_stats.py Outdated
@maxnoe
Copy link
Copy Markdown
Member

maxnoe commented Nov 6, 2024

Currently fails to read prod5 files (which have no EFFECTIVE focal length information).

prod 5 files should have effective focal length

Comment thread src/ctapipe/tools/calculate_pixel_stats.py
@kosack
Copy link
Copy Markdown
Member

kosack commented Nov 6, 2024

image

In the output, how can I tell what column was aggregated? It is always named "statistics" and there is no metadata in the group or tables that contain that information. Wouldn't it be better to name the group like monitoring/statistics/{input_column_name}? (i.e. maybe set the default of output_column_name to be the value of input_column_name? And also add the input_column namein the output table's metadata (table.meta['input_column_name']=input_column_name)

Copy link
Copy Markdown
Member

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general comment: with very minor changes, this could be turned into ctapipe-calculate-stats, i.e. the ability to compute stats for any column, not just pixel-wise ones.

  • Expose TableLoader as a configurable component (needed anyhow, see above)
  • minor modifications to drop assumption on data shape in calculator.py.

I would expect e.g. to be able to do:

ctapipe-calculate-pixel-statistics -i events-prod5.DL1.h5  
    --StatisticsAggregator.chunk_size=100 
    --StatisticsCalculatorTool.input_column_name hillas_length 
    -o length.h5

and get the stats on the length parameter. This is perhaps outside the scope of this PR, but should be kept in mind. It also relates to @maxnoe's comment that we could change the API to accept a mapping of columns to Aggragators.

kosack
kosack previously requested changes Nov 6, 2024
Copy link
Copy Markdown
Member

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A common error is to have too small a chunk size, but this now results in a very ugly error and a full trace-back and exception, along with an UnclosedFileWarning (bug?)

  • The former (Unexpected exception) should e caught and raises as a ToolConfigurationError, so the user gets a nice message. And please explain in the message what parameters controls this, i.e. say Change --StatisticsAggregator.chunk_size to decrease this.
  • The latter (unclosed file) seems to be a bug to fix.
2024-11-06 15:14:43,361 ERROR [ctapipe.StatisticsCalculatorTool] (tool.run): Caught unexpected exception: The length of the provided table (853) is insufficient to meet the required statistics for a single chunk of size (2500).
2024-11-06 15:14:43,361 ERROR [ctapipe.StatisticsCalculatorTool] (tool.run): Caught unexpected exception: The length of the provided table (853) is insufficient to meet the required statistics for a single chunk of size (2500).
Traceback (most recent call last):
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/core/tool.py", line 431, in run
    self.start()
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/tools/calculate_pixel_stats.py", line 134, in start
    aggregated_stats = self.stats_calculator.first_pass(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/monitoring/calculator.py", line 169, in first_pass
    aggregated_stats = aggregator(
                       ^^^^^^^^^^^
  File "/Users/kkosack/Projects/CTA/Working/ctapipe/src/ctapipe/monitoring/aggregator.py", line 86, in __call__
    raise ValueError(
ValueError: The length of the provided table (853) is insufficient to meet the required statistics for a single chunk of size (2500).
2024-11-06 15:14:43,377 INFO [ctapipe.StatisticsCalculatorTool] (tool.write_provenance): Output:
/Users/kkosack/miniconda3/envs/ctapipe-0.21/lib/python3.12/site-packages/tables/file.py:113: UnclosedFileWarning: Closing remaining open file: /Users/kkosack/Projects/CTA/PipeWork/v0.21.3/events-prod5.DL1.h5
  warnings.warn(UnclosedFileWarning(msg))

@maxnoe
Copy link
Copy Markdown
Member

maxnoe commented Jan 28, 2025

Needs update with main to fix docs build

@ctao-dpps-sonarqube

This comment has been minimized.

1 similar comment
@ctao-dpps-sonarqube
Copy link
Copy Markdown

Passed

Analysis Details

0 Issues

  • Bug 0 Bugs
  • Vulnerability 0 Vulnerabilities
  • Code Smell 0 Code Smells

Coverage and Duplications

  • Coverage 87.50% Coverage (94.00% Estimated after merge)
  • Duplications 0.00% Duplicated Code (0.70% Estimated after merge)

Project ID: cta-observatory_ctapipe_AY52EYhuvuGcMFidNyUs

View in SonarQube

@maxnoe maxnoe merged commit ef73aa8 into main Jan 29, 2025
@maxnoe maxnoe deleted the stats_calc_tool branch January 29, 2025 07:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:calib issues related to ctapipe.calib

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants