Skip to content

Char type as a special length one string type #4334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yegle opened this issue Dec 7, 2017 · 4 comments
Closed

Char type as a special length one string type #4334

yegle opened this issue Dec 7, 2017 · 4 comments

Comments

@yegle
Copy link

yegle commented Dec 7, 2017

Cross post from typing repo: python/typing#510

The original issue contains some additional context, but is it feasible to implement a Char type as a special case of Text type?

@elazarg
Copy link
Contributor

elazarg commented Dec 7, 2017

I think this is similar (in challenges and special-casing) to having a singleton type. (Imagine unary alphabet; for most purposes mypy doesn't know it's not). But it's less general and requires understanding of len for flow-dependent checking. If we can do this, we might as well add singleton types, and a special "AnyChar" value. Then Char is Singleton[AnyChar].

@JukkaL
Copy link
Collaborator

JukkaL commented Dec 7, 2017

I've sometimes wondered about having a character type, but haven't yet found very convincing use cases.

One option would be to implement this as a separate string-like type with a promotion from Char to Text (similar to the promotion from int to float that we already have). Both would be represented at runtime by the same type. A single-character string literal would have type Char (maybe only in a Char type context, though).

However, there are probably simpler ways to catch accidentally using str when Iterable[str] or Sequence[str] is expected. For example, we could have a strictness option that would flag these. If we had such option, I'd probably turn it on in codebases that I work on, for sure.

@JukkaL
Copy link
Collaborator

JukkaL commented Jan 29, 2020

Literal['x'] can sometimes be used for this use case. More generally, there is no concrete proposal that seems practical, so I'm closing the issue.

@JukkaL JukkaL closed this as completed Jan 29, 2020
@Hibou57
Copy link

Hibou57 commented Sep 3, 2020

There is also a common misconception about what a character is. There are code points and graphemes. What one sees in a text editor, are graphemes, what a computer sees in a file, are code points. Some graphemes are made of two code points, a code point for a base glyph and a code point for an overlayed glyph.

As far I experienced it, Python deals with code point, so a grapheme made of two glyphs, will be a string of length 2.

The best to deal with code points individually, is to deal with their integer value which can be retrieve with the ord() function.

I would rather say the issue is that Python does not provide a built-in iterator to iterate over a string as a sequence of code point integer values, but one can define such an iterator easily. I also believe there is no Python function to split a string into graphemes, by the way (unless i’m wrong).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants