#684: add tag checking for invalid characters #691

BrentonPoke · 2020-08-01T04:43:17Z

This is to address a concern brought up in #684 where strings like "mad\nrid" with a \n character would bork a tag. I approached this through a set of two regular expressions:
nameRegex = "([a-zA-Z0-9-_]+)"
valueRegex = "[\x00-\x7F]+"

Names of tags I interpret as being more strict than values, which should probably be allowed the full range of ASCII characters. Current unknowns are weird preferences others might want to report in values. I personally think ASCII codes 0-127 should be sufficient for everybody's needs, but comment below.

These are applied through a utils subpackage created in the dto package. It's not intended to be used by the developer, but I wanted something slightly portable that can be used in dto classes in the future. ~~I also added a new exception to the InfluxDBException wrapper so the modified tag methods in the Point and BatchPoints classes can through it to the developer to let them know what's wrong.~~ There is a new exception, but to pass the unit tests, I've removed the throws for them.

I am new to this codebase, but am familiar with timeseries databases. Hope this is a good starting point.

codecov-commenter · 2020-08-09T23:53:16Z

Codecov Report

Merging #691 into master will increase coverage by 0.02%.
The diff coverage is 94.11%.

@@             Coverage Diff              @@
##             master     #691      +/-   ##
============================================
+ Coverage     88.26%   88.29%   +0.02%     
- Complexity      730      738       +8     
============================================
  Files            69       70       +1     
  Lines          2540     2554      +14     
  Branches        268      273       +5     
============================================
+ Hits           2242     2255      +13     
  Misses          210      210              
- Partials         88       89       +1

Impacted Files	Coverage Δ	Complexity Δ
src/main/java/org/influxdb/dto/BatchPoints.java	`87.87% <75.00%> (-0.67%)`	`24.00 <0.00> (ø)`
src/main/java/org/influxdb/dto/Point.java	`93.38% <100.00%> (+0.02%)`	`53.00 <0.00> (ø)`
...rc/main/java/org/influxdb/dto/utils/CheckTags.java	`100.00% <100.00%> (ø)`	`8.00 <8.00> (?)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 125a9ca...f34b73e. Read the comment docs.

src/main/java/org/influxdb/dto/utils/CheckTags.java

src/main/java/org/influxdb/dto/Point.java

BrentonPoke · 2020-08-14T16:48:30Z

Sorry about the commit history. Never used Codecov before, but I learned a few things.

testBufferLimitGreaterThanActions is still failing for unkown reasons reverting prinout checkstyle doesn't like my comment let's remove that, too cleaning up an unused exception class i made attempting better code coverage attempting to raise code coverage. refactor fixing checkstyle adding another case

BrentonPoke · 2020-09-04T14:36:09Z

Anybody available to review this one?

BrentonPoke

All these changes have been made.

majst01 · 2020-09-17T06:50:59Z

TBH, 21 commits for a single change makes me nervous, and i am really unsure if i want to merge this PR.
@fmachado should at least check if this is worth merging.

BrentonPoke · 2020-09-17T11:17:27Z

You can squash the commits on merging. I tried to, but it didn't work as intended. And the only reason there are this many is due to fighting with CodeCov

majst01 · 2020-09-17T11:34:34Z

I know that i can squash them, this is not my point.

BrentonPoke · 2020-09-17T12:02:38Z

Then why else would what I've done not be worth merging?

majst01 · 2020-09-17T13:17:31Z

Thats why i asked @fmachado for additional review, its not useless.

fmachado · 2020-09-17T14:22:34Z

@majst01 I'll take a look, thanks.

@BrentonPoke could you please share with us what InfluxDB reference documentation (or source code) you used to decide for that regexp? I couldn't find anything in your comments. :(

BrentonPoke · 2020-09-18T01:04:24Z

@BrentonPoke could you please share with us what InfluxDB reference documentation (or source code) you used to decide for that regexp? I couldn't find anything in your comments. :(

The first one just takes care of the issue brought up, which was that carriage return and newline are not properly ignored. The second one just makes sure that all characters are ascii values 0-127, which covers all of English and a number of symbols. There isn't anything in the documentation about any other restrictions on tags. There's nothing in the documentation regarding legal field characters, either. but I guess the restrictions can be removed for the values if that's too much.

fmachado · 2020-09-18T08:27:07Z

The first one just takes care of the issue brought up, which was that carriage return and newline are not properly ignored. The second one just makes sure that all characters are ascii values 0-127, which covers all of English and a number of symbols.

Apologies for not being clear with my first message: it's important for us to know that new code (1) is compliant with the line protocol (e.g. how InfluxDB handles what you name here "invalid characters") and (2) we are backward compatible with existing implementation (e.g. no unchecked exception, changes on public APIs and internal behavior like filtering non ASCII if they were accepted before).

We have to guide our implementation here based on how InfluxDB works. Could you please link here the InfluxDB documentation you used to support your implementation?

BrentonPoke · 2020-09-18T13:47:40Z

The documentation here says no newline characters, but doesn't restrict ascii values for fields as far as I can tell. I can remove the value regular expression entirely, so is that ok?

fmachado · 2020-09-23T16:32:51Z

@BrentonPoke yes, as it's written there:

Line protocol does not support the newline character \n in tag values or field values.

Would be OK for you to submit a PR just with the relevant code and changes required for this fix?

BrentonPoke · 2020-09-24T00:24:45Z

You mean throw away this commit history? I can change it on this PR right now. I guess I can transfer what's needed from this branch manually to a new one and make a new pull request, but I'm not sure what that solves that a squash merge couldn't.

BrentonPoke · 2020-09-25T02:25:06Z

Since I already did it, I'll create the new pr from the other branch I have.

BrentonPoke added 5 commits August 1, 2020 00:27

influxdata#684: add tag checking for invalid characters

1d4016b

influxdata#684: forgot that posix-styled stuff doesn't work here

ef277d0

cleaning out throwing of exception.

f5aac7a

attempting to fix checkstyle errors

226eece

attempting to fix checkstyle errors pt. 2

04e1a21

BrentonPoke added 2 commits August 9, 2020 22:07

Adding unit tests for coverage

507fbb2

removing unused exception

f259449

majst01 requested changes Aug 14, 2020

View reviewed changes

BrentonPoke added 11 commits August 14, 2020 04:42

revised per feedback on pullrequst

4b9157d

missing a carriage return test

a684bd9

testBufferLimitGreaterThanActions is still failing for unkown reasons

6d475c5

reverting prinout

3622a59

checkstyle doesn't like my comment

37309e1

let's remove that, too

fced434

cleaning up an unused exception class i made

89c3ece

attempting better code coverage

b38fd60

attempting to raise code coverage.

0725aef

refactor

735ffb4

fixing checkstyle

78725e3

BrentonPoke requested a review from majst01 August 14, 2020 16:47

BrentonPoke added 3 commits August 15, 2020 02:07

adding another case

0e8c7aa

Merge branch 'master' of github.com:BrentonPoke/influxdb-java

f34b73e

BrentonPoke commented Sep 17, 2020

View reviewed changes

majst01 requested a review from fmachado September 17, 2020 06:51

fmachado self-assigned this Sep 18, 2020

BrentonPoke closed this Sep 25, 2020

#684: add tag checking for invalid characters #691

#684: add tag checking for invalid characters #691

Uh oh!

Conversation

BrentonPoke commented Aug 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Aug 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrentonPoke commented Aug 14, 2020

Uh oh!

BrentonPoke commented Sep 4, 2020

Uh oh!

BrentonPoke left a comment

Choose a reason for hiding this comment

Uh oh!

majst01 commented Sep 17, 2020

Uh oh!

BrentonPoke commented Sep 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

majst01 commented Sep 17, 2020

Uh oh!

BrentonPoke commented Sep 17, 2020

Uh oh!

majst01 commented Sep 17, 2020

Uh oh!

fmachado commented Sep 17, 2020

Uh oh!

BrentonPoke commented Sep 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmachado commented Sep 18, 2020

Uh oh!

BrentonPoke commented Sep 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmachado commented Sep 23, 2020

Uh oh!

BrentonPoke commented Sep 24, 2020

Uh oh!

BrentonPoke commented Sep 25, 2020

Uh oh!

Uh oh!

BrentonPoke commented Aug 1, 2020 •

edited

Loading

codecov-commenter commented Aug 9, 2020 •

edited

Loading

BrentonPoke commented Sep 17, 2020 •

edited

Loading

BrentonPoke commented Sep 18, 2020 •

edited

Loading

BrentonPoke commented Sep 18, 2020 •

edited

Loading