-
Notifications
You must be signed in to change notification settings - Fork 477
#684: add tag checking for invalid characters #691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #691 +/- ##
============================================
+ Coverage 88.26% 88.29% +0.02%
- Complexity 730 738 +8
============================================
Files 69 70 +1
Lines 2540 2554 +14
Branches 268 273 +5
============================================
+ Hits 2242 2255 +13
Misses 210 210
- Partials 88 89 +1
Continue to review full report at Codecov.
|
Sorry about the commit history. Never used Codecov before, but I learned a few things. |
testBufferLimitGreaterThanActions is still failing for unkown reasons reverting prinout checkstyle doesn't like my comment let's remove that, too cleaning up an unused exception class i made attempting better code coverage attempting to raise code coverage. refactor fixing checkstyle adding another case
Anybody available to review this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these changes have been made.
TBH, 21 commits for a single change makes me nervous, and i am really unsure if i want to merge this PR. |
You can squash the commits on merging. I tried to, but it didn't work as intended. And the only reason there are this many is due to fighting with CodeCov |
I know that i can squash them, this is not my point. |
Then why else would what I've done not be worth merging? |
Thats why i asked @fmachado for additional review, its not useless. |
@majst01 I'll take a look, thanks. @BrentonPoke could you please share with us what InfluxDB reference documentation (or source code) you used to decide for that regexp? I couldn't find anything in your comments. :( |
The first one just takes care of the issue brought up, which was that carriage return and newline are not properly ignored. The second one just makes sure that all characters are ascii values 0-127, which covers all of English and a number of symbols. There isn't anything in the documentation about any other restrictions on tags. There's nothing in the documentation regarding legal field characters, either. but I guess the restrictions can be removed for the values if that's too much. |
Apologies for not being clear with my first message: it's important for us to know that new code (1) is compliant with the line protocol (e.g. how InfluxDB handles what you name here "invalid characters") and (2) we are backward compatible with existing implementation (e.g. no unchecked exception, changes on public APIs and internal behavior like filtering non ASCII if they were accepted before). We have to guide our implementation here based on how InfluxDB works. Could you please link here the InfluxDB documentation you used to support your implementation? |
The documentation here says no newline characters, but doesn't restrict ascii values for fields as far as I can tell. I can remove the value regular expression entirely, so is that ok? |
@BrentonPoke yes, as it's written there:
Would be OK for you to submit a PR just with the relevant code and changes required for this fix? |
You mean throw away this commit history? I can change it on this PR right now. I guess I can transfer what's needed from this branch manually to a new one and make a new pull request, but I'm not sure what that solves that a squash merge couldn't. |
Since I already did it, I'll create the new pr from the other branch I have. |
This is to address a concern brought up in #684 where strings like "mad\nrid" with a \n character would bork a tag. I approached this through a set of two regular expressions:
nameRegex = "([a-zA-Z0-9-_]+)"
valueRegex = "[\x00-\x7F]+"
Names of tags I interpret as being more strict than values, which should probably be allowed the full range of ASCII characters. Current unknowns are weird preferences others might want to report in values. I personally think ASCII codes 0-127 should be sufficient for everybody's needs, but comment below.
These are applied through a utils subpackage created in the dto package. It's not intended to be used by the developer, but I wanted something slightly portable that can be used in dto classes in the future.
I also added a new exception to theThere is a new exception, but to pass the unit tests, I've removed the throws for them.InfluxDBException
wrapper so the modified tag methods in thePoint
andBatchPoints
classes can through it to the developer to let them know what's wrong.I am new to this codebase, but am familiar with timeseries databases. Hope this is a good starting point.