Skip to content

New validation: 3bytes characters filter (4 bytes characters cannot be stored using UTF8) #12253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 28, 2017

Conversation

KarlDeux
Copy link

@KarlDeux KarlDeux commented Nov 14, 2017

Fixes #12058

Description

Added a new validation for regular emojis.

Fixed Issues (if relevant)

  1. Can't save emoji in custom product options #12058 : Can't save emoji in custom product options

Manual testing scenarios

  1. Add a Customizable Option to a product (text / textarea).
  2. Try to set an emoji into the text / textarea field in the frontend.
  3. There should pop a validation error asking to remove emoji from field.

Contribution checklist

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (if applicable)
  • All automated tests passed successfully (all builds on Travis CI are green)

captura de pantalla 2017-11-16 a las 16 32 35

@magento-cicd2
Copy link
Contributor

magento-cicd2 commented Nov 14, 2017

CLA assistant check
All committers have signed the CLA.

@magento-engcom-team magento-engcom-team added Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release labels Nov 14, 2017
@orlangur
Copy link
Contributor

Why there should be any restriction on unicode characters used instead of proper serialization?

@KarlDeux
Copy link
Author

Well... it's actually a validation, guess that you can use it or not.
Still can be serialized tho, dunno if that adds any value into the product configuration.

@orlangur
Copy link
Contributor

I mean: how hard is it to fix the root cause of the problem so that such characters can be stored properly?

Also, I really doubt that among all Unicode characters only emodjis cause troubles.

@KarlDeux
Copy link
Author

Since the table quote_item_option is set as utf8 / utf8_general_ci (3 bytes) and emoji require 4 bytes there are two possible options:

  1. We change the db / table encoding to utf8mb4 / utf8mb4_general_ci (4 bytes).
  2. We encode the value of the option for storage and then perform the opposite action to decode it on read.

Both are solutions yet they would have massive impact.
What would be the workaround?

@KarlDeux
Copy link
Author

Also, this is applicable on every single input that we've in Magento since all our tables are utf8 based.

PS: As for the "I really doubt that among all Unicode characters only emodjis cause troubles" ... indeed they are or, at least, they should. Emoji are require 4bytes and most (if not all of them) language characters are covered with 3bytes (utf8).

@KarlDeux
Copy link
Author

So, for instance, Reviews will present the very same issue. They are not emoji-filtered and will break the sentence at the level of the emoji while on trying to store it into the database.

It's way faster to prevent users adding emojis into the fields than applying a drastic change on data storage / sanitize methods in all the platform. In the end, I just created a validate JS method that can be called with a validate type into the field.

Knowing now that this applies to every single input on the site I wonder if we should validate it by default or keep using the antiemoji tag to do so.

@orlangur
Copy link
Contributor

Emoji are require 4bytes and most (if not all of them) language characters are covered with 3bytes (utf8).

Thanks for clarification. Can we disallow all characters requiring 4bytes to store then? And make validation message something like "Invalid characters found, please remove them: 😍" when you enter "123 😍".

@KarlDeux
Copy link
Author

Can do this.
Do you want this validation to be standard on any imput or still be associated to a validation type like “email” or the one I stated “antiemoji”?

@KarlDeux
Copy link
Author

@orlangur Done as requested. Validation created with tag 'charbytes'.
It will be triggered if any character requires more than 3 bytes to be stored.

Input text value "test 🤠 😏 abc 😒"

Will return validation error as:

Please remove invalid characters: 🤠, 😏, 😒.

@KarlDeux KarlDeux changed the title New validation: anti-emoji. New validation: 3bytes characters filter (4 bytes characters cannot be stored using UTF8) Nov 16, 2017
@KarlDeux
Copy link
Author

W: http://dl.hhvm.com/ubuntu/dists/trusty/InRelease: Signature by key 36AEF64D0207E7EEE352D4875A16E7281BE7A449 uses weak digest algorithm (SHA1)
W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest algorithm (SHA1)
E: Failed to fetch http://dl.hhvm.com/ubuntu/dists/trusty/main/binary-i386/Packages 404 Not Found [IP: 52.84.14.228 80]
E: Some index files failed to download. They have been ignored, or old ones used instead.
Error: Command sudo apt-get update on line 39 failed with exit code 100
The command "./dev/travis/before_install.sh" failed and exited with 100 during .

Can anyone make Travis rebuild it without the need of making a phantom commit?

@ihor-sviziev
Copy link
Contributor

@KarlDeux looks like build is passed. Actually to trigger new build you cool close and reopen PR

@KarlDeux
Copy link
Author

@orlangur are you ok with the development now? I was thinking about making the unit testing but, since test-validation.js I was wondering if it would be worthy or not.

@KarlDeux
Copy link
Author

Nvm, added unit testing to the validation.
Will perform the refactor of the JS unit testing anyway on a separated branch for #12342.

@KarlDeux KarlDeux closed this Nov 21, 2017
@KarlDeux KarlDeux reopened this Nov 21, 2017
@KarlDeux KarlDeux closed this Nov 21, 2017
@KarlDeux KarlDeux reopened this Nov 21, 2017
@KarlDeux
Copy link
Author

@magento-engcom-team this is ready now, no plans to touch anything else unless you have to add anything.

@okorshenko okorshenko self-assigned this Nov 22, 2017
@okorshenko okorshenko added this to the November 2017 milestone Nov 22, 2017
@orlangur orlangur self-assigned this Nov 22, 2017
@@ -396,6 +396,24 @@
$.mage.__('Please enter at least {0} characters')
],

/* detect chars that would require more than 3 bytes */
'charbytes': [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What charbytes stands for? I would call it something like validate-no-utf8mb4-characters for consistency with other rules.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good?

function (value) {
var validator = this,
message = $.mage.__('Please remove invalid characters: {0}.'),
matches = value.match(/(?:[\uD800-\uDBFF][\uDC00-\uDFFF])/g),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is that different from https://stackoverflow.com/a/16496799? Could you provide a source for current regexp maybe?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://stackoverflow.com/a/16346705
https://en.wikipedia.org/wiki/UTF-16#Code_points_U.2B10000_to_U.2B10FFFF

Future reference solution won't work as it's only for ES6 and we're currently ES5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good to me then :)

@orlangur
Copy link
Contributor

@KarlDeux great! Could you squash all changes into single commit also? Just to keep history cleaner.

@KarlDeux
Copy link
Author

Squashed!

@orlangur
Copy link
Contributor

@KarlDeux thanks!

Merge pull request #2 from magento/2.2-develop

Interesting technique, did you do it as squashed merge on your own fork and then force pushed?

@KarlDeux
Copy link
Author

😂 yessir sometimes the force helps.

@orlangur
Copy link
Contributor

Just worried about merge commit) Which is totally fine here as it is empty. Usually force push gives a single commit.

@KarlDeux
Copy link
Author

@magento-team what update shall I include or consider?

@okorshenko okorshenko merged commit c21229c into magento:2.2-develop Nov 28, 2017
okorshenko pushed a commit that referenced this pull request Nov 28, 2017
@magento-engcom-team magento-engcom-team added the Fixed in 2.2.x The issue has been fixed in 2.2 release line label Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Fixed in 2.2.x The issue has been fixed in 2.2 release line Progress: accept Release Line: 2.2 Reproduced on 2.1.x The issue has been reproduced on latest 2.1 release Reproduced on 2.2.x The issue has been reproduced on latest 2.2 release Reproduced on 2.3.x The issue has been reproduced on latest 2.3 release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants