Skip to content

Add a check for the leading and trailing spaces #2504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

amieiro
Copy link

@amieiro amieiro commented Nov 4, 2024

As I described in #2501, at translate.w.org, sometimes we get translations that start or end with one or more empty spaces, and we want to avoid this.

This PR adds a new check to detect if a translatable string has leading or trailing spaces.

Fixes #2501.

Copy link
Member

@jrfnl jrfnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @amieiro, thanks for this PR.

I haven't tested it yet, but based on a code review alone, this is looking good.

I wonder if the tests can be made a little more varied ? I.e. not just use leading/trailing spaces, but also have some leading/trailing tabs, new lines and combinations of spaces, tabs, new lines.

What do you think ?

@jrfnl
Copy link
Member

jrfnl commented Dec 24, 2024

@amieiro Just checking in as you never responded to the question I posed in my initial review.... Do you still intend to finish off this PR ?

@amieiro
Copy link
Author

amieiro commented May 27, 2025

Sorry, I don't have the time to finish it at this time.

array( $param_info['clean'] )
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

array( $param_info['clean'] )
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

array( $param_info['clean'] )
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

array( $param_info['clean'] )
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

array( $param_info['clean'] )
);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_leading_tabs = '/^\x09+/u';
$pattern_trailing_tabs = '/\x09+$/u';
$pattern_leading_vtabs = '/^\x0B+/u';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_leading_vtabs = '/^\x0B+/u';
$pattern_leading_vtabs = '/^\x0B+/u';

$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_leading_tabs = '/^\x09+/u';
$pattern_trailing_tabs = '/\x09+$/u';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_trailing_tabs = '/\x09+$/u';
$pattern_trailing_tabs = '/\x09+$/u';

// Define regex patterns.
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_leading_tabs = '/^\x09+/u';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_leading_tabs = '/^\x09+/u';
$pattern_leading_tabs = '/^\x09+/u';


// Define regex patterns.
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_trailing_spaces = '/[\x20]+$/u';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_trailing_spaces = '/[\x20]+$/u';
$pattern_trailing_spaces. = '/[\x20]+$/u';

$first_non_empty = $this->phpcsFile->findNext( Tokens::$emptyTokens, $param_info['start'], ( $param_info['end'] + 1 ), true );

// Define regex patterns.
$pattern_leading_spaces = '/^[\x20]+/u';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pattern_leading_spaces = '/^[\x20]+/u';
$pattern_leading_spaces = '/^[\x20]+/u';

@jrfnl
Copy link
Member

jrfnl commented Jun 15, 2025

@amieiro Seeing as you've been working on this again - could you please explain why you have set things up to have separate errors for spaces vs tabs vs vtabs vs newlines ?

My previous question was only about having tests in place with these and the sniff handling all types of whitespace, not about splitting that up into multiple errors.

I cannot think of any situation where one type of (leading/trailing) whitespace would be acceptable, but another would not be, so having different error codes doesn't make sense to me.

Is there a reason for this distinction based on your experiences in the Polyglots team ? I'd be interested to hear your reasoning for this.

@amieiro
Copy link
Author

amieiro commented Jun 16, 2025

@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR.

The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here.
And we have another thing to check: the different whitespace characters. The regular space characters is U+0020, but there are a lot of different whitespace characters: U+00A0, U+2000, U+2001, U+2002,...

The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:

  • Contemplate only the U+0020 case and add the other cases to the tests.
  • Add all the commented situations.

Which approach do you prefer?

@jrfnl jrfnl marked this pull request as draft June 16, 2025 14:29
@jrfnl
Copy link
Member

jrfnl commented Jun 16, 2025

@jrfnl Sorry, this PR is not yet ready for review and I forgot to convert it to draft. As we return to contribute with WordPress.org, I resumed work on this PR.

Okay, thanks for letting me know. I've converted it to draft now and won't do an in-detail review until it has been marked as "Ready to review".

The horizontal tabulations are not often seen at the beginning or end of sentences, but they are seen in the middle, so I wanted to prevent its use, since I don't think it is correct to use it. You can see a use case inside a string in the core here. The same applies to the new lines. You can see an example in the core here.

I think that this PR should probably focus on the original problem it was trying to solve - leading and trailing whitespace.

Expanding the scope to mid-string whitespace characters feels like scope-creep and may have different implications (i.e., would involve a different decision point, potentially delaying the merging of the code related to the original problem).

I suggest opening a follow-up issue for that to discuss if this needs flagging and if so, what would need flagging. (I mean, looking at the examples, these seem like perfectly valid use-cases for mid string vertical whitespace, especially if you also consider RTL and BTT languages).

And we have another thing to check: the different whitespace characters. The regular space characters is U+0020, but there are a lot of different whitespace characters: U+00A0, U+2000, U+2001, U+2002,...

The most common problem is the U+0020 space problem (in fact it is the only one I usually see), so the others are really edge situations. I can continue this PR in two different approaches:

  • Contemplate only the U+0020 case and add the other cases to the tests.
  • Add all the commented situations.

Which approach do you prefer?

For leading and trailing whitespace, IMO the new errors should flag all whitespace characters, but without differentiating in the error message or code.
What I mean by that is:

  • Change the error message(s) to use the term "whitespace" instead of explicitly saying what type of whitespace you're flagging.
  • Change the error code(s) to also use the term "whitespace" - this would mean you'd end up with two error codes: LeadingWhiteSpace and TrailingWhiteSpace.

Does that help ?

@jrfnl jrfnl removed this from the 3.2.0 milestone Jun 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

I18n: Report the translatable strings with leading and trailing spaces
3 participants