-
Notifications
You must be signed in to change notification settings - Fork 62
Implement functions to drop and interpolate over low-confidence datapoints #116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement functions to drop and interpolate over low-confidence datapoints #116
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #116 +/- ##
==========================================
+ Coverage 99.21% 99.28% +0.06%
==========================================
Files 8 9 +1
Lines 509 556 +47
==========================================
+ Hits 505 552 +47
Misses 4 4 ☔ View full report in Codecov by Sentry. |
Some particular items I would like to receive some feedback on:
To-Do:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @b-peri! You did a great job, and opening this draft PR was very effective at laying out the problems and the solutions. Here are my high-level thoughts (details can be found in the individual comments):
- I like the logging, but I think we can generalise it to any filtering function via an appropriate decorator (see example decorator in the respective comment).
- Filtering should only operate on
pose_tracks
, and in the future perhaps onvelocity
,acceleration
etc, definitely not onconfidence
- Accordingly I propose renaming the functions as
interpolate_over_time()
andfilter_by_confidence()
. - I think we should completely scrap the "inplace" idea (for reasons given in the comment). Make all functions return a new dataset.
- I think it's enough to have a text report of dropped values, plots can come later. But the current implementation of this feature is incorrect and should be fixed.
- For now, I would put the module in
movement/filtering.py
, not sure it counts as "analysis". When it grows, we can consider making it into a subfolder - The interpolation makes use of the
bottleneck
library, which is not installed by default. The dependency needs to change toxarray[accel]
instead of "vanilla" xarray (see explanation)
Let me know your thoughts on the above.
movement/analysis/filtering.py
Outdated
ds_thresholded = ds.where(ds.confidence >= threshold) | ||
|
||
# Diagnostics | ||
print("\nDatapoints Filtered:\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we shoud log this as "info", instead of printing it.
logger = logging.getLogger(__name__)
logger.info("text goes here".)
1f970e6
to
faa3f6d
Compare
@b-peri takeaways from today's chat on this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your extensive feedback @niksirbi! I've now implemented all of your suggestions and added testing for the new functions!
I just had a few minor points that I think would be useful to get some feedback on, which I'll put below
* Check for expected `dims` and `data_vars` in dataset * Fix `missing_dim_dataset` fixture * Rename `poses` accessor to `move` * Rename `PoseAccessor` class to `MoveAccessor` * Rename `poses_accessor.py` to `move_accessor.py` * Move `move_accessor.py` to the top level * Fix accessor docstring formatting
* Draft compute velocity * Add test for displacement * Fix confidence values in `valid_pose_dataset` fixture * Refactor kinematics test and functions * Vectorise kinematic functions * Refactor repeated calls to compute magnitude + direction * Displacement to return 0 instead of NaN * Return x y components in kinematic functions * Refactor kinematics tests * Remove unnecessary instantiations * Improve time diff calculation * Prefix kinematics methods with `compute_` * Add kinematic properties to `PosesAccessor` * Update `test_property` docstring * Refactor `_vector` methods and kinematics tests * Update `expected_dataset` docstring * Rename `poses` to `move` in `PosesAccessor` * Refactor properties in `PosesAccessor` * Remove vector util functions and tests * Update `not_a_dataset` fixture description * Validate dataset upon accessor property access * Update `poses_accessor` test description * Validate input data in kinematic functions * Remove unused fixture * Parametrise kinematics tests * Set `compute_derivative` as internal function * Update `kinematics.py` docstrings * Add new modules to API docs * Update `move_accessor` docstrings * Rename `test_move_accessor` filename
…atics-unit#125) * also test on macOS 14 M1 runner * conda install hdf5
Bumps [actions/cache](https://github.com/actions/cache) from 3 to 4. - [Release notes](https://github.com/actions/cache/releases) - [Changelog](https://github.com/actions/cache/blob/main/RELEASES.md) - [Commits](actions/cache@v3...v4) --- updated-dependencies: - dependency-name: actions/cache dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
updates: - [github.com/astral-sh/ruff-pre-commit: v0.2.0 → v0.3.0](astral-sh/ruff-pre-commit@v0.2.0...v0.3.0) - [github.com/psf/black: 24.1.1 → 24.2.0](psf/black@24.1.1...24.2.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* Add codecov token to test_and_deploy.yml * use test action from main branch * switch back to using v2 of the test action
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @b-peri, thanks a lot for implementing the suggestions!
I made some final tweaks:
- I merged the main branch to resolve conflicts
- I renamed the
filter_diagnostics
function toreport_nan_values
, to more explicitly reflect its purpose. I also changed the way it's invoked by filtering/interpolation functions, so that the user gets an overview of the change in NaNs (made by the operation). - I switched to using
xarray
's built-inisnull()
andcopy
methods (instead ofnumpy.isnan
andcopy.copy
- I rephrased some of the docstrings
- I added
fill_value="extrapolate"
to theinterpolate_na
method, to also take care of nan values at the start and end of the array (if they are smaller than themax_gap
). - I added the new functions to the API index in the docs
- I wrote a new example for the docs, showcasing how the new functions can be used to clead data
I will merge this as is now.
I think we should add some more extensive testing to cover some edgecases, but I will open a separate issue for that.
Description
This PR introduces the
filtering
module, and adds two new major user-facing functions tomovement
. The first,filter_by_confidence()
, drops pose track datapoints where the associated confidence value falls below a user-defined threshold, and replaces these with NaNs. The second,interpolate_over_time()
, interpolates over such NaN values across thetime
axis using the xarray built-ininterpolate_na()
.Edit 11/03/2024:
This PR also introduces two new convenience functions for the
filtering
module:log_to_attrs()
decorator adds a log of any operations performed (in the form of a dictionary) to the affectedxarray.Dataset
's attributes, under["log"]
.filter_diagnostics()
function counts and logs the number of datapoints that have been filtered from every individual keypoint.References
Closes #97
Checklist: