This repository was archived by the owner on May 17, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 279
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
682e434
to
caf766c
Compare
dlawin
reviewed
Oct 31, 2023
dlawin
reviewed
Oct 31, 2023
nolar
previously requested changes
Nov 1, 2023
c351010
to
4a3f703
Compare
dlawin
reviewed
Nov 13, 2023
83ce259
to
842481f
Compare
dlawin
approved these changes
Nov 14, 2023
dismissing "changes requested" per previous convo
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Some databases, for example, Teradata, have a short limit for string and requires to set a number of symbols in
CHAR/VARCHAR
, i.e. it is required to write it like thiswhere
n
must be set and has some max allowed value.To calculate md5 hash and convert it to an integer value, values of columns for one row are cast to strings, and concatenated afterwards. Let me clarify on an example:
To concatenate we use a construction like this:
The question is what should
n1
andn2
be? I am sure that the maximum allowed value for a specific type, for example,N
. It is needed to keep all customer information without losses. However, such a concatenation will lead to a type overflow, because we are trying to haveVARCHAR(N + N)
which is not allowed.To avoid such an overflow, we should shorten string values but not to loss information. I see one possible solution: taking hash for each item of a concat op, i.e.
Benefits:
Drawbacks:
PREVENT_OVERFLOW_WHEN_CONCAT
flag.n
ofVARCHAR(n)
is low. Do not think it is a big problem because typically a largeN
might be 32000 or more symbols so customer columns should have more than 1000 column in diffing.