-
Notifications
You must be signed in to change notification settings - Fork 185
Communicate information about filtered data points #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The scale.unknown option can be used to this effect — examples. |
This would now happen, I guess, in the default filter Line 291 in 9d9ba91
|
As an additional twist on this, it would be great if we could provide informative error messages for two seemingly common cases:
Likely candidates for capitalization errors could be found by comparing the key provided to all the keys in the input object in a way that ignores case (i.e., converting both to lowercase before comparing). Misspellings are more complex than that, possibly using Levenshtein distance and a threshold (or finding the closest match and suggesting that). The latter is an expensive operation, but it would only have to be run when there's an error (or a presumed error), and it would mostly delay the error message, not interfere with normal Plot operation. |
* Guard against formatDefault returning undefined (it always returns a string except when the value is NaN) Closes #1334 related to #493 * coalesce null to empty string * DRY --------- Co-authored-by: Mike Bostock <[email protected]>
It would be useful to also generate a warning when the given data as a whole is nullish, e.g. Not a fault of Plot that my data is broken of course, but a message like The documentation does state that |
* Guard against formatDefault returning undefined (it always returns a string except when the value is NaN) Closes observablehq#1334 related to observablehq#493 * coalesce null to empty string * DRY --------- Co-authored-by: Mike Bostock <[email protected]>
It would be useful if exploratory plots came with a visual indicator of “discarded data”.
This would improve Plot's capacity for exploratory data analysis by enabling users to become aware of anomalous values that violate their assumptions about the data.
For example, I changed a scale from log to symlog and discovered a bunch of negative values where I wasn’t expecting any.
The data was supposed to be strictly positive and the negative values indicated a processing error, but since the default log scale filtered those data points out I only noticed because I went out of my way to do additional spot checks.
Plot could have made it evident immediately, e.g. with a legend saying something like “100 datapoints not shown”. Even more useful (maybe) would be being able to see a "data pipeline" and how many points are filtered out at each stage.
@Fil observes that some filters use the discarding as a basic mechanism to do their work as intended, so there are subtle questions about what to communicate for this to be a useful signal.
For the exploratory use case I think it makes sense for this to be on by default, since spot-checking every individual assumption manually can get onerous (e.g. checking for null/undefined, zeros where there shouldn’t be any, negative numbers where there shouldn’t be any, values outside of the x/y/color domain, NaN, etc.)
A separate tool such as a summary table could be used to learn about missing/pathological data in a dataset, but it would still be useful for Plot to flag these issues since they can creep in during downstream processing and plot transformations.
The text was updated successfully, but these errors were encountered: