Implement functions to drop and interpolate over low-confidence datapoints #116

b-peri · 2024-02-12T15:48:44Z

Description

This PR introduces the filtering module, and adds two new major user-facing functions to movement. The first, filter_by_confidence(), drops pose track datapoints where the associated confidence value falls below a user-defined threshold, and replaces these with NaNs. The second, interpolate_over_time(), interpolates over such NaN values across the time axis using the xarray built-in interpolate_na().

Edit 11/03/2024:
This PR also introduces two new convenience functions for the filtering module:

The log_to_attrs() decorator adds a log of any operations performed (in the form of a dictionary) to the affected xarray.Dataset's attributes, under ["log"].
The filter_diagnostics() function counts and logs the number of datapoints that have been filtered from every individual keypoint.

Bug fix
Addition of a new feature
Other

References

Closes #97

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

codecov · 2024-02-12T15:51:21Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.28%. Comparing base (aaba495) to head (78004d1).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #116      +/-   ##
==========================================
+ Coverage   99.21%   99.28%   +0.06%     
==========================================
  Files           8        9       +1     
  Lines         509      556      +47     
==========================================
+ Hits          505      552      +47     
  Misses          4        4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

b-peri · 2024-02-12T16:46:43Z

Some particular items I would like to receive some feedback on:

Operation logs are now being stored as a list of dictionaries under the logs attribute, with each individual entry representing a past operation. Each entry follows roughly the following format: {'operation': operation, 'parameter_1': argument1, 'parameter_2': argument2, ...,'datetime': str(datetime.now())}. Very open to discussing alternative ways of doing this!
Both functions currently act both on confidence and pose_tracks variables, but I'm unsure whether this makes sense (e.g. I don't think interpolating over confidence values is bound to be particularly informative). Shall I restrict filter_confidence() function to act only on the pose_tracks variable?
filter_confidence() currently reports the proportion of frames/predictions that have been filtered (both as a percentage and in absolute number of frames), but I was unsure about whether it might be useful to include anything else. I was also thinking we could write some logic to allow users to (optionally) plot some confidence distributions + imposed thresholds when running this function, but maybe this is overkill. Any feedback on this would be appreciated!
Function names, particularly that of interp_pose. I think that a more generic function name (maybe even something like simply "interpolate") may be more informative, but given that the use-case is specifically for pose tracks in this case, I'm a bit unsure of how specific to go.
I've put these functions in a new module called filtering.py (under /movement/analysis/, following on from Compute locomotion features #106). If a different module name would be more appropriate please let me know!

To-Do:

Add tests

niksirbi

Thanks a lot @b-peri! You did a great job, and opening this draft PR was very effective at laying out the problems and the solutions. Here are my high-level thoughts (details can be found in the individual comments):

I like the logging, but I think we can generalise it to any filtering function via an appropriate decorator (see example decorator in the respective comment).
Filtering should only operate on pose_tracks, and in the future perhaps on velocity, acceleration etc, definitely not on confidence
Accordingly I propose renaming the functions as interpolate_over_time() and filter_by_confidence().
I think we should completely scrap the "inplace" idea (for reasons given in the comment). Make all functions return a new dataset.
I think it's enough to have a text report of dropped values, plots can come later. But the current implementation of this feature is incorrect and should be fixed.
For now, I would put the module in movement/filtering.py, not sure it counts as "analysis". When it grows, we can consider making it into a subfolder
The interpolation makes use of the bottleneck library, which is not installed by default. The dependency needs to change to xarray[accel] instead of "vanilla" xarray (see explanation)

Let me know your thoughts on the above.

movement/analysis/filtering.py

niksirbi · 2024-02-19T12:59:33Z

movement/analysis/filtering.py

+    ds_thresholded = ds.where(ds.confidence >= threshold)
+
+    # Diagnostics
+    print("\nDatapoints Filtered:\n")


we shoud log this as "info", instead of printing it.

logger = logging.getLogger(__name__) logger.info("text goes here".)

movement/analysis/filtering.py

…dence()` fxns

niksirbi · 2024-02-23T13:56:01Z

@b-peri takeaways from today's chat on this:

do the filtering only on "pose_tracks", not on "confidence" and (in the future) also not on "velocity" or other derivative variables. Basically we want people to clean first, analyse later.
drop the inplace idea for good, return a new dataset.
the logging and reporting should probably have been different PRs (this is mostly my fault for shoving these features in here). Feel free to merge just the filtering bit and deal with logging and/or reporting separately. For this particular PR, since you have anyway done most of the work already, also feel free to merge including these bits, but if one of them becomes tedious, drop it, merge without it, and open a separate issue for the missing bit.

b-peri

Many thanks for your extensive feedback @niksirbi! I've now implemented all of your suggestions and added testing for the new functions!

I just had a few minor points that I think would be useful to get some feedback on, which I'll put below

movement/filtering.py

* Check for expected `dims` and `data_vars` in dataset * Fix `missing_dim_dataset` fixture * Rename `poses` accessor to `move` * Rename `PoseAccessor` class to `MoveAccessor` * Rename `poses_accessor.py` to `move_accessor.py` * Move `move_accessor.py` to the top level * Fix accessor docstring formatting

* Draft compute velocity * Add test for displacement * Fix confidence values in `valid_pose_dataset` fixture * Refactor kinematics test and functions * Vectorise kinematic functions * Refactor repeated calls to compute magnitude + direction * Displacement to return 0 instead of NaN * Return x y components in kinematic functions * Refactor kinematics tests * Remove unnecessary instantiations * Improve time diff calculation * Prefix kinematics methods with `compute_` * Add kinematic properties to `PosesAccessor` * Update `test_property` docstring * Refactor `_vector` methods and kinematics tests * Update `expected_dataset` docstring * Rename `poses` to `move` in `PosesAccessor` * Refactor properties in `PosesAccessor` * Remove vector util functions and tests * Update `not_a_dataset` fixture description * Validate dataset upon accessor property access * Update `poses_accessor` test description * Validate input data in kinematic functions * Remove unused fixture * Parametrise kinematics tests * Set `compute_derivative` as internal function * Update `kinematics.py` docstrings * Add new modules to API docs * Update `move_accessor` docstrings * Rename `test_move_accessor` filename

…atics-unit#125) * also test on macOS 14 M1 runner * conda install hdf5

Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@v3...v4) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

updates: - [github.com/astral-sh/ruff-pre-commit: v0.2.0 → v0.3.0](astral-sh/ruff-pre-commit@v0.2.0...v0.3.0) - [github.com/psf/black: 24.1.1 → 24.2.0](psf/black@24.1.1...24.2.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add codecov token to test_and_deploy.yml * use test action from main branch * switch back to using v2 of the test action

sonarqubecloud · 2024-03-11T17:48:09Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

niksirbi

Hey @b-peri, thanks a lot for implementing the suggestions!

I made some final tweaks:

I merged the main branch to resolve conflicts
I renamed the filter_diagnostics function to report_nan_values, to more explicitly reflect its purpose. I also changed the way it's invoked by filtering/interpolation functions, so that the user gets an overview of the change in NaNs (made by the operation).
I switched to using xarray's built-in isnull() and copy methods (instead of numpy.isnan and copy.copy
I rephrased some of the docstrings
I added fill_value="extrapolate" to the interpolate_na method, to also take care of nan values at the start and end of the array (if they are smaller than the max_gap).
I added the new functions to the API index in the docs
I wrote a new example for the docs, showcasing how the new functions can be used to clead data

I will merge this as is now.

I think we should add some more extensive testing to cover some edgecases, but I will open a separate issue for that.

movement/filtering.py

b-peri marked this pull request as draft February 12, 2024 15:49

niksirbi reviewed Feb 19, 2024

View reviewed changes

b-peri added 2 commits February 21, 2024 15:59

Added filtering.py module, w/ draft interp_pose() & `filter_confi…

b87e9f0

…dence()` fxns

Fix logging for operations in place

faa3f6d

b-peri force-pushed the filter_confidence branch from 1f970e6 to faa3f6d Compare February 21, 2024 16:01

b-peri added 2 commits February 21, 2024 18:05

Renamed fxns to interpolate_over_time() and filter_by_confidence

1aa87b7

Cleaned up code + corrected docstrings

2a7a485

niksirbi mentioned this pull request Feb 22, 2024

Compute locomotion features #106

Merged

7 tasks

b-peri added 6 commits February 29, 2024 14:38

Refactored filtering.py to movement base folder

487fd5a

Improved logging logic, fixed diagnostic report, removed in-place

538aefc

Removed printing of diagnostic report

a97a604

Updated dependency from xarray to xarray[accel]

914f554

Added testing for filtering.py

715d4b1

Minor fixes and clean-up

2ea4454

b-peri commented Mar 11, 2024

View reviewed changes

movement/filtering.py Show resolved Hide resolved

movement/filtering.py Outdated Show resolved Hide resolved

movement/filtering.py Outdated Show resolved Hide resolved

b-peri marked this pull request as ready for review March 11, 2024 09:22

lochhh and others added 11 commits March 11, 2024 11:42

Include M1 runners in CI and update install instructions (neuroinform…

c2476d3

…atics-unit#125) * also test on macOS 14 M1 runner * conda install hdf5

Add dependabot config (neuroinformatics-unit#128)

4c15734

Add codecov token to test_and_deploy.yml (neuroinformatics-unit#129)

6fd9c85

* Add codecov token to test_and_deploy.yml * use test action from main branch * switch back to using v2 of the test action

tweaked phrasing in docstrings

3b3e062

added filtering functions to API index

642bab7

add note about default confidence threshold

51ca49d

Merge branch 'main' into filter_confidence

227e2f4

niksirbi added 5 commits March 11, 2024 14:03

rename and reorganise filter_diagnostics as report_nan_values

6734d06

use xarray's copy method

fbcdab8

max_gaps is in seconds and also works at edges

52783eb

use xarray's built-in isnull method

2ba29dd

added sphinx-gallery example for filtering and interpolation

78004d1

niksirbi approved these changes Mar 12, 2024

View reviewed changes

movement/filtering.py Outdated Show resolved Hide resolved

movement/filtering.py Outdated Show resolved Hide resolved

movement/filtering.py Outdated Show resolved Hide resolved

movement/filtering.py Outdated Show resolved Hide resolved

niksirbi merged commit cf312d2 into neuroinformatics-unit:main Mar 12, 2024

This was referenced Mar 12, 2024

Add tables pip package to docs requirements #137

Merged

More thorough tests for the filtering.py module #138

Closed

lochhh mentioned this pull request Jan 20, 2025

Extrapolate points in interpolate_over_time? #379

Closed

Implement functions to drop and interpolate over low-confidence datapoints #116

Implement functions to drop and interpolate over low-confidence datapoints #116

Uh oh!

Conversation

b-peri commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

References

Checklist:

Uh oh!

codecov bot commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

b-peri commented Feb 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To-Do:

Uh oh!

niksirbi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

niksirbi Feb 19, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

niksirbi commented Feb 23, 2024

Uh oh!

b-peri left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Mar 11, 2024

Quality Gate passed

Uh oh!

niksirbi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

b-peri commented Feb 12, 2024 •

edited

Loading

codecov bot commented Feb 12, 2024 •

edited

Loading

b-peri commented Feb 12, 2024 •

edited

Loading

niksirbi left a comment •

edited

Loading

niksirbi left a comment •

edited

Loading