-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New validation: 3bytes characters filter (4 bytes characters cannot be stored using UTF8) #12253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Why there should be any restriction on unicode characters used instead of proper serialization? |
Well... it's actually a validation, guess that you can use it or not. |
I mean: how hard is it to fix the root cause of the problem so that such characters can be stored properly? Also, I really doubt that among all Unicode characters only emodjis cause troubles. |
Since the table quote_item_option is set as utf8 / utf8_general_ci (3 bytes) and emoji require 4 bytes there are two possible options:
Both are solutions yet they would have massive impact. |
Also, this is applicable on every single input that we've in Magento since all our tables are utf8 based. PS: As for the "I really doubt that among all Unicode characters only emodjis cause troubles" ... indeed they are or, at least, they should. Emoji are require 4bytes and most (if not all of them) language characters are covered with 3bytes (utf8). |
So, for instance, Reviews will present the very same issue. They are not emoji-filtered and will break the sentence at the level of the emoji while on trying to store it into the database. It's way faster to prevent users adding emojis into the fields than applying a drastic change on data storage / sanitize methods in all the platform. In the end, I just created a validate JS method that can be called with a validate type into the field. Knowing now that this applies to every single input on the site I wonder if we should validate it by default or keep using the antiemoji tag to do so. |
Thanks for clarification. Can we disallow all characters requiring 4bytes to store then? And make validation message something like "Invalid characters found, please remove them: 😍" when you enter "123 😍". |
Can do this. |
@orlangur Done as requested. Validation created with tag 'charbytes'. Input text value "test 🤠 😏 abc 😒" Will return validation error as: Please remove invalid characters: 🤠, 😏, 😒. |
W: http://dl.hhvm.com/ubuntu/dists/trusty/InRelease: Signature by key 36AEF64D0207E7EEE352D4875A16E7281BE7A449 uses weak digest algorithm (SHA1) Can anyone make Travis rebuild it without the need of making a phantom commit? |
@KarlDeux looks like build is passed. Actually to trigger new build you cool close and reopen PR |
@orlangur are you ok with the development now? I was thinking about making the unit testing but, since test-validation.js I was wondering if it would be worthy or not. |
Nvm, added unit testing to the validation. |
@magento-engcom-team this is ready now, no plans to touch anything else unless you have to add anything. |
lib/web/mage/validation.js
Outdated
@@ -396,6 +396,24 @@ | |||
$.mage.__('Please enter at least {0} characters') | |||
], | |||
|
|||
/* detect chars that would require more than 3 bytes */ | |||
'charbytes': [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What charbytes
stands for? I would call it something like validate-no-utf8mb4-characters
for consistency with other rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good?
lib/web/mage/validation.js
Outdated
function (value) { | ||
var validator = this, | ||
message = $.mage.__('Please remove invalid characters: {0}.'), | ||
matches = value.match(/(?:[\uD800-\uDBFF][\uDC00-\uDFFF])/g), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is that different from https://stackoverflow.com/a/16496799? Could you provide a source for current regexp maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://stackoverflow.com/a/16346705
https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF
Future reference solution won't work as it's only for ES6 and we're currently ES5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good to me then :)
@KarlDeux great! Could you squash all changes into single commit also? Just to keep history cleaner. |
Squashed! |
😂 yessir sometimes the force helps. |
Just worried about merge commit) Which is totally fine here as it is empty. Usually force push gives a single commit. |
@magento-team what update shall I include or consider? |
…acters cannot be stored using UTF8) #12253
Fixes #12058
Description
Added a new validation for regular emojis.
Fixed Issues (if relevant)
Manual testing scenarios
Contribution checklist