Disp reconstruction by LukasBeiske · Pull Request #2138 · cta-observatory/ctapipe

LukasBeiske · 2022-11-25T15:48:45Z

This is a continuation of #2064 after #1767 was merged.

For now I only implemented a simple weighted average to combine single telescope predictions into event predictions. The estimation of the error of this average seems to underestimate the error a bit, based on a small MC study I did.
I will look into other ways to combine the telescope predictions in the future (including other error estimates), but for now this is better than no error at all, in my opinion.

codecov · 2022-11-25T16:04:09Z

Codecov Report

Base: 92.75% // Head: 92.78% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (596811b) compared to base (0e4ffea).
Patch coverage: 94.27% of modified lines in pull request are covered.

❗ Current head 596811b differs from pull request most recent head 22c1ed0. Consider uploading reports for the commit 22c1ed0 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2138      +/-   ##
==========================================
+ Coverage   92.75%   92.78%   +0.02%     
==========================================
  Files         216      216              
  Lines       18050    18385     +335     
==========================================
+ Hits        16742    17058     +316     
- Misses       1308     1327      +19

Impacted Files	Coverage Δ
ctapipe/tools/process.py	`87.87% <ø> (+1.21%)`	⬆️
ctapipe/tools/train_energy_regressor.py	`85.29% <ø> (ø)`
ctapipe/tools/train_particle_classifier.py	`86.58% <ø> (ø)`
ctapipe/tools/train_disp_reconstructor.py	`87.05% <87.05%> (ø)`
ctapipe/conftest.py	`94.42% <91.66%> (-0.17%)`	⬇️
ctapipe/tools/apply_models.py	`95.04% <92.00%> (+1.94%)`	⬆️
ctapipe/reco/sklearn.py	`94.60% <93.41%> (-0.89%)`	⬇️
ctapipe/reco/stereo_combination.py	`95.31% <96.15%> (+0.27%)`	⬆️
ctapipe/containers.py	`100.00% <100.00%> (ø)`
ctapipe/coordinates/disp.py	`100.00% <100.00%> (ø)`
... and 14 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

LukasBeiske · 2022-11-30T17:49:53Z

I seem to have missed that commit 71672d8 breaks apply-models for the energy regressor (haven't tested the particle classifier yet) which is also why some tests are failing atm.
However, I don't understand which change causes this error, especially since I did not actually change anything at the point in the code where the error occurs.

Traceback (most recent call last):
  File "/scratch/lbeiske/envs/cta/lib/python3.8/site-packages/tables/table.py", line 2192, in append
    wbufRA = np.rec.array(rows, dtype=self._v_dtype)
  File "/scratch/lbeiske/envs/cta/lib/python3.8/site-packages/numpy/core/records.py", line 1078, in array
    new = obj.view(dtype)
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/net/nfshome/home/lbeiske/ctapipe/ctapipe/core/tool.py", line 358, in run
    self.start()
  File "/net/nfshome/home/lbeiske/ctapipe/ctapipe/tools/apply_models.py", line 178, in start
    self._apply(reconstructor)
  File "/net/nfshome/home/lbeiske/ctapipe/ctapipe/tools/apply_models.py", line 272, in _apply
    write_table(
  File "/net/nfshome/home/lbeiske/ctapipe/ctapipe/io/astropy_helpers.py", line 211, in write_table
    h5_table.append(table.as_array())
  File "/scratch/lbeiske/envs/cta/lib/python3.8/site-packages/tables/table.py", line 2194, in append
    raise ValueError("rows parameter cannot be converted into a "
ValueError: rows parameter cannot be converted into a recarray object compliant with table '/dl2/event/telescope/energy/RandomForestRegressor/tel_001 (Table(18158,)fletcher32, shuffle, blosc:zstd(5)) '''. The error was: <When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.>

nbiederbeck · 2022-12-02T15:14:29Z

We found the issue in

ctapipe/ctapipe/io/tableloader.py

Lines 326 to 329 in f789773

    
           observation_table = self.read_observation_information(start=start, stop=stop) 
        
           table = join_allow_empty( 
        
               table, observation_table, keys="obs_id", join_type="left" 
        
           )

(Pdb) table['obs_id'].dtype
dtype('int32')
(Pdb) observation_table['obs_id'].dtype
dtype('uint64')

The join makes a float64 out of this.

Do we need to reprocess the files after #2096?

maxnoe · 2022-12-08T15:15:57Z

@LukasBeiske Could you show some performance plots if you think this is ready?

LukasBeiske · 2022-12-08T15:21:08Z

Sure. cv_disp.pdf these are the performance plots for the model I am currently using.

This is the config I am using atm:

TrainDispReconstructor:
  CrossValidator:
    n_cross_validations: 5

  random_seed: 42

  DispReconstructor:
    norm_cls: RandomForestRegressor
    norm_config:
      n_estimators: 69
      max_features: 0.5227
      max_samples: 0.7138
      min_samples_leaf: 0.000013
      n_jobs: 40

    log_target: True

    sign_cls: RandomForestClassifier
    sign_config:
      n_estimators: 343
      max_features: 0.6587
      max_samples: 0.5815
      min_samples_leaf: 0.000035
      n_jobs: 40

    features:
      - log_RandomForestRegressor_energy
      - log_tel_impact_distance
      - log_RandomForestRegressor_tel_energy
      - log_abs_timing_slope
      - peak_time_std
      - concentration_pixel
      - hillas_length
      - concentration_cog
      - timing_deviation
      - scaled_length
      - HillasReconstructor_h_max
      - RandomForestRegressor_energy
      - area
      - peak_time_kurtosis
      - RandomForestRegressor_tel_energy
      - timing_slope
      - hillas_skewness
      - HillasReconstructor_core_x
      - HillasReconstructor_core_y

    QualityQuery:
      quality_criteria:
        - ["enough intensity", "hillas_intensity > 50"]
        - ["Positive width", "hillas_width > 0"]
        - ["enough pixels", "morphology_n_pixels > 3"]
        - ["not clipped", "leakage_intensity_width_2 < 0.5"]
        - ["HillasValid", "HillasReconstructor_is_valid"]

    FeatureGenerator:
      features:
        - ["area", "hillas_width * hillas_length"]
        - ["log_RandomForestRegressor_energy", "log(RandomForestRegressor_energy)"]
        - ["log_RandomForestRegressor_tel_energy", "log(RandomForestRegressor_tel_energy)"]
        - ["log_tel_impact_distance", "log(HillasReconstructor_tel_impact_distance)"]
        - ["log_abs_timing_slope", "log(abs(timing_slope))"]

maxnoe · 2022-12-08T15:26:39Z

Why is core-y so important? That's confusing me a bit.

Could you show the single LST performance? I.e. no stereo features for LST?

LukasBeiske · 2022-12-08T16:33:25Z

For mono I used this config:

TrainDispReconstructor:
  CrossValidator:
    n_cross_validations: 5

  random_seed: 42

  DispReconstructor:
    norm_cls: RandomForestRegressor
    norm_config:
      n_estimators: 100
      max_features: "sqrt"
      min_samples_leaf: 0.00001
      n_jobs: 40

    log_target: True

    sign_cls: RandomForestClassifier
    sign_config:
      n_estimators: 100
      max_features: "sqrt"
      min_samples_split: 0.00001
      n_jobs: 40

    features:
      - hillas_intensity
      - hillas_length
      - hillas_width
      - hillas_skewness
      - hillas_kurtosis
      - timing_slope
      - timing_deviation
      - peak_time_std
      - concentration_cog
      - leakage_intensity_width_2
      - morphology_n_pixels
      - scaled_length
      - scaled_width
      - RandomForestRegressor_energy
      - log_abs_timing_slope

    QualityQuery:
      quality_criteria:
        - ["enough intensity", "hillas_intensity > 50"]
        - ["Positive width", "hillas_width > 0"]
        - ["enough pixels", "morphology_n_pixels > 3"]
        - ["not clipped", "leakage_intensity_width_2 < 0.5"]

    FeatureGenerator:
      features:
        - ["log_abs_timing_slope", "log(abs(timing_slope))"]

And the performance plots still look pretty good imo (sign performs slightly worse): cv_disp_mono.pdf