Skip to content

Unicode character issue with unicode units (e.g. cm²) #133

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pelson opened this issue Jan 17, 2019 · 6 comments
Closed

Unicode character issue with unicode units (e.g. cm²) #133

pelson opened this issue Jan 17, 2019 · 6 comments

Comments

@pelson
Copy link
Member

pelson commented Jan 17, 2019

According to the udunits grammar, superscript characters are supported (who knew!).

$ udunits2 -H m*m -W cm²
    1 m*m = 10000 cm²
    x/cm² = 10000*(x/m*m)

Looks like this isn't supported in cf_units though:

>>> cf_units.Unit('cm²').symbol
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pelson/dev/scitools/cf-units/cf_units/__init__.py", line 844, in __init__
    ut_unit = _ud.parse(_ud_system, unit.encode('ascii'), UT_ASCII)
UnicodeEncodeError: 'ascii' codec can't encode character '\xb2' in position 2: ordinal not in range(128)

That encode('ascii') is looking pretty shaky here...

FWIW, this is quite an esoteric issue - I only found this by studying the UDUNITS grammar carefully - I haven't found this in the wild.

@DPeterK
Copy link
Member

DPeterK commented Jan 17, 2019

Unicode all the things!

That said, my worry is that this ascii encoding is actually for compatibility with the underlying udunits2 library. Perhaps there's a UT_UNICODE we can use instead for parsing the unit?

@pelson
Copy link
Member Author

pelson commented Jan 21, 2019

In addition, when adding a unicode test to try this out, pytest is causing a SegFault. It isn't clear that this is exactly cf-units' fault (pytest-dev/pytest#2875), but it does relate to the fact that a Cython exception is being raised that contains a ut_status status (I think):

Example test:

def test_unicode_invalid(self):
    # Not all unicode characters are allowed! (but some are, but aren't documented [π])
    u = Unit('ø')
PYTHONFAULTHANDLER=1 pytest cf_units/tests/test_unit.py  -sv -k 'test_unicode_invalid'
Coverage.py warning: --include is ignored because --source is set (include-ignored)
======================================================================== test session starts ========================================================================
platform darwin -- Python 3.7.1, pytest-4.1.1, py-1.7.0, pluggy-0.8.1 -- /Users/pelson/dev/scitools/cf-units/env/bin/python
cachedir: .pytest_cache
rootdir: /Users/pelson/dev/scitools/cf-units, inifile: setup.cfg
plugins: faulthandler-1.5.0, cov-2.6.1
collected 161 items / 160 deselected                                                                                                                                

cf_units/tests/test_unit.py::Test_unit__creation::test_unicode_invalid Fatal Python error: Segmentation fault

Current thread 0x000000010dda85c0 (most recent call first):
  File "/Users/pelson/dev/scitools/cf-units/cf_units/__init__.py", line 1211 in format
  File "/Users/pelson/dev/scitools/cf-units/cf_units/__init__.py", line 1262 in symbol
  File "/Users/pelson/dev/scitools/cf-units/cf_units/__init__.py", line 1451 in __str__
  File "/Users/pelson/dev/scitools/cf-units/cf_units/__init__.py", line 1470 in __repr__
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/py/_io/saferepr.py", line 38 in _callhelper
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/py/_io/saferepr.py", line 33 in repr_instance
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/reprlib.py", line 62 in repr1
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/reprlib.py", line 52 in repr
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/py/_io/saferepr.py", line 38 in _callhelper
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/py/_io/saferepr.py", line 13 in repr
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/py/_io/saferepr.py", line 71 in saferepr
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/_code/code.py", line 623 in _saferepr
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/_code/code.py", line 629 in repr_args
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/_code/code.py", line 711 in repr_traceback_entry
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/_code/code.py", line 752 in repr_traceback
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/_code/code.py", line 807 in repr_excinfo
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/_code/code.py", line 557 in getrepr
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/nodes.py", line 284 in _repr_failure_py
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/python.py", line 734 in repr_failure
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/runner.py", line 268 in pytest_runtest_makereport
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 62 in <lambda>
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 68 in _hookexec
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/hooks.py", line 284 in __call__
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/runner.py", line 175 in call_and_report
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/runner.py", line 93 in runtestprotocol
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/runner.py", line 78 in pytest_runtest_protocol
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 62 in <lambda>
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 68 in _hookexec
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/hooks.py", line 284 in __call__
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/main.py", line 264 in pytest_runtestloop
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 62 in <lambda>
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 68 in _hookexec
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/hooks.py", line 284 in __call__
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/main.py", line 243 in _main
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/main.py", line 203 in wrap_session
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/main.py", line 236 in pytest_cmdline_main
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/callers.py", line 187 in _multicall
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 62 in <lambda>
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/manager.py", line 68 in _hookexec
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/pluggy/hooks.py", line 284 in __call__
  File "/Users/pelson/dev/scitools/cf-units/env/lib/python3.7/site-packages/_pytest/config/__init__.py", line 80 in main
  File "/Users/pelson/dev/scitools/cf-units/env/bin/pytest", line 11 in <module>
Segmentation fault: 11

Installing pytest-faulthandler did not solve the issue for me (as was suggested in the linked issue).

@pelson
Copy link
Member Author

pelson commented Jan 21, 2019

On my travels I found some pretty scary SegFaults from the python interpreter...

class Failure(object):
    def __getattr__(self, attr):
        return (self, None)

issubclass(Failure(), int)  # *BOOM!*

Ref: pytest-dev/pytest#2470

There are 19 other issues on PyTest that reference SegFaults... 😱time to track down the issue.

@pelson
Copy link
Member Author

pelson commented Jan 21, 2019

Workaround, as suggested in pytest-dev/pytest#4406 (comment) is to use --assert=plain when running pytest. This does NOT work.

[EDIT]: I got the conclusion wrong, it does not work.

@pelson
Copy link
Member Author

pelson commented Jan 21, 2019

So if I think about it a little bit, this is because pytest tries giving us context about what objects went into each of the calls. The problem is that the constructor of cf_units.Unit is the thing that is creating the exception (rightly so). When pytest is giving the context of what it has seen it does some magic (I didn't find the code yet) that allows it to snaffle a reference to the object that was partially created by the exception.

The __repr__ of cf_units.Unit assumes that its constructor was successful (quite reasonably) and does some low-level C stuff that would raise a SegmentationFault had it not... herein lies the problem. A simple way to reproduce this:

class SegRepr:
    """
    The most horrific class... it raises a SegFault if you try to repr me.

    """
    def __init__(self):
        raise ValueError("You really don't want me.")

    def __repr__(self):
        print("You can't get here, which is lucky because I'm "
              "about to SegFault.")
        import ctypes
        ctypes.string_at(0)

On the whole, this is a pretty safe class - you can't construct one unless you are really not playing ball:

print(SegRepr.__new__(SegRepr))

However, it isn't as safe as you think when using pytest, because it is trying to be helpful by repr-ing the objects in the stack trace for a failing test:

def test_bad():
    assert SegRepr() == 1
$ pytest seg.py -sv
Coverage.py warning: --include is ignored because --source is set (include-ignored)
======================================================================== test session starts ========================================================================
platform darwin -- Python 3.7.1, pytest-4.1.1, py-1.7.0, pluggy-0.8.1 -- /Users/pelson/dev/scitools/cf-units/env/bin/python
cachedir: .pytest_cache
rootdir: /Users/pelson/dev/scitools/cf-units, inifile: setup.cfg
plugins: cov-2.6.1
collecting ... You really don't want me.
collected 1 item                                                                                                                                                    

seg.py::test_bad You can't get here, which is lucky because I'm about to SegFault.
Segmentation fault: 11

I will raise this in the pytest issue tracker, but I suspect that the only way for them to avoid this would be to avoid repr-ing the object in the stack trace. This would really hamper the quality of the error log, and as a user of pytest I'd be really disappointed to lose that feature. I'll raise it there in-case there is a solution which would allow some middle-ground (given I don't know the pytest codebase, it may be that there is enough context to know whether the thing was constructed correctly).

Given the above paragraph, I conclude that cf-units should implement a safe __repr__ that does not assume that the object has been constructed correctly.

pelson added a commit to pelson/cf_units that referenced this issue Jan 21, 2019
Also, allow a file encoding in the coding standards test, so that we can have some literal unicode characters for testing with.
@pelson pelson mentioned this issue Jan 21, 2019
pelson added a commit to pelson/cf_units that referenced this issue Jan 22, 2019
Also, allow a file encoding in the coding standards test, so that we can have some literal unicode characters for testing with.
bjlittle pushed a commit that referenced this issue Jan 22, 2019
* Always treat units as unicode. Closes #133.

Also, allow a file encoding in the coding standards test, so that we can have some literal unicode characters for testing with.

* Fix date2num test which is incorrectly using repr when str was intended.

* Tidy up the Unit constructor for the py2 case, so that it is easier to see that the code can be deleted when the codebase becomes py3 only.

* Handle unicode object in py2 specially, and always ensure that py2k returns a non-unicode for __str__ (unless sys.getdefaultencoding says otherwise).

* Ensure that the error raised in Unit constructor handles unicode too.
@pelson
Copy link
Member Author

pelson commented Jan 23, 2019

🎉 π m²

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants