Skip to content

#684: add tag checking for invalid characters #691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

BrentonPoke
Copy link

@BrentonPoke BrentonPoke commented Aug 1, 2020

This is to address a concern brought up in #684 where strings like "mad\nrid" with a \n character would bork a tag. I approached this through a set of two regular expressions:
nameRegex = "([a-zA-Z0-9-_]+)"
valueRegex = "[\x00-\x7F]+"

Names of tags I interpret as being more strict than values, which should probably be allowed the full range of ASCII characters. Current unknowns are weird preferences others might want to report in values. I personally think ASCII codes 0-127 should be sufficient for everybody's needs, but comment below.

These are applied through a utils subpackage created in the dto package. It's not intended to be used by the developer, but I wanted something slightly portable that can be used in dto classes in the future. I also added a new exception to the InfluxDBException wrapper so the modified tag methods in the Point and BatchPoints classes can through it to the developer to let them know what's wrong. There is a new exception, but to pass the unit tests, I've removed the throws for them.

I am new to this codebase, but am familiar with timeseries databases. Hope this is a good starting point.

@codecov-commenter
Copy link

codecov-commenter commented Aug 9, 2020

Codecov Report

Merging #691 into master will increase coverage by 0.02%.
The diff coverage is 94.11%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #691      +/-   ##
============================================
+ Coverage     88.26%   88.29%   +0.02%     
- Complexity      730      738       +8     
============================================
  Files            69       70       +1     
  Lines          2540     2554      +14     
  Branches        268      273       +5     
============================================
+ Hits           2242     2255      +13     
  Misses          210      210              
- Partials         88       89       +1     
Impacted Files Coverage Δ Complexity Δ
src/main/java/org/influxdb/dto/BatchPoints.java 87.87% <75.00%> (-0.67%) 24.00 <0.00> (ø)
src/main/java/org/influxdb/dto/Point.java 93.38% <100.00%> (+0.02%) 53.00 <0.00> (ø)
...rc/main/java/org/influxdb/dto/utils/CheckTags.java 100.00% <100.00%> (ø) 8.00 <8.00> (?)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 125a9ca...f34b73e. Read the comment docs.

@BrentonPoke BrentonPoke requested a review from majst01 August 14, 2020 16:47
@BrentonPoke
Copy link
Author

Sorry about the commit history. Never used Codecov before, but I learned a few things.

testBufferLimitGreaterThanActions is still failing for unkown reasons

reverting prinout

checkstyle doesn't like my comment

let's remove that, too

cleaning up an unused exception class i made

attempting better code coverage

attempting to raise code coverage.

refactor

fixing checkstyle

adding another case
@BrentonPoke
Copy link
Author

Anybody available to review this one?

Copy link
Author

@BrentonPoke BrentonPoke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these changes have been made.

@majst01
Copy link
Collaborator

majst01 commented Sep 17, 2020

TBH, 21 commits for a single change makes me nervous, and i am really unsure if i want to merge this PR.
@fmachado should at least check if this is worth merging.

@majst01 majst01 requested a review from fmachado September 17, 2020 06:51
@BrentonPoke
Copy link
Author

BrentonPoke commented Sep 17, 2020

You can squash the commits on merging. I tried to, but it didn't work as intended. And the only reason there are this many is due to fighting with CodeCov

@majst01
Copy link
Collaborator

majst01 commented Sep 17, 2020

I know that i can squash them, this is not my point.

@BrentonPoke
Copy link
Author

Then why else would what I've done not be worth merging?

@majst01
Copy link
Collaborator

majst01 commented Sep 17, 2020

Thats why i asked @fmachado for additional review, its not useless.

@fmachado
Copy link
Contributor

@majst01 I'll take a look, thanks.

@BrentonPoke could you please share with us what InfluxDB reference documentation (or source code) you used to decide for that regexp? I couldn't find anything in your comments. :(

@BrentonPoke
Copy link
Author

BrentonPoke commented Sep 18, 2020

@BrentonPoke could you please share with us what InfluxDB reference documentation (or source code) you used to decide for that regexp? I couldn't find anything in your comments. :(

The first one just takes care of the issue brought up, which was that carriage return and newline are not properly ignored. The second one just makes sure that all characters are ascii values 0-127, which covers all of English and a number of symbols. There isn't anything in the documentation about any other restrictions on tags. There's nothing in the documentation regarding legal field characters, either. but I guess the restrictions can be removed for the values if that's too much.
ASCII chart for values 0 through 127

@fmachado fmachado self-assigned this Sep 18, 2020
@fmachado
Copy link
Contributor

The first one just takes care of the issue brought up, which was that carriage return and newline are not properly ignored. The second one just makes sure that all characters are ascii values 0-127, which covers all of English and a number of symbols.

Apologies for not being clear with my first message: it's important for us to know that new code (1) is compliant with the line protocol (e.g. how InfluxDB handles what you name here "invalid characters") and (2) we are backward compatible with existing implementation (e.g. no unchecked exception, changes on public APIs and internal behavior like filtering non ASCII if they were accepted before).

We have to guide our implementation here based on how InfluxDB works. Could you please link here the InfluxDB documentation you used to support your implementation?

@BrentonPoke
Copy link
Author

BrentonPoke commented Sep 18, 2020

The documentation here says no newline characters, but doesn't restrict ascii values for fields as far as I can tell. I can remove the value regular expression entirely, so is that ok?

@fmachado
Copy link
Contributor

@BrentonPoke yes, as it's written there:

Line protocol does not support the newline character \n in tag values or field values.

Would be OK for you to submit a PR just with the relevant code and changes required for this fix?

@BrentonPoke
Copy link
Author

You mean throw away this commit history? I can change it on this PR right now. I guess I can transfer what's needed from this branch manually to a new one and make a new pull request, but I'm not sure what that solves that a squash merge couldn't.

@BrentonPoke
Copy link
Author

Since I already did it, I'll create the new pr from the other branch I have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants