Skip to content

can't ocr anything with 2.6.2 #337

@starenka

Description

@starenka
(tmp-42dc3f1969e972a) starenka /data/.envs/tmp-42dc3f1969e972a % ipython
Python 3.11.6 (main, Oct  8 2023, 05:06:43) [GCC 13.2.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.11.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tesserocr

In [2]: tesserocr.file_to_text('/tmp/test.jpg')
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[2], line 1
----> 1 tesserocr.file_to_text('/tmp/test.jpg')

File tesserocr.pyx:2621, in tesserocr.file_to_text()

RuntimeError: Failed to read picture

In [3]: !pip list | grep tesserocr
tesserocr         2.6.2

In [5]: print(tesserocr.tesseract_version())
tesseract 5.3.3
 leptonica-1.83.1
  libpng 1.6.34 : zlib 1.2.11

it works okay with <2.6

In [1]: import tesserocr

In [2]: tesserocr.file_to_text('/tmp/test.jpg')
Out[2]: ">&EoASCADE\n\nEn\n\nEmoji Meaning Emoji Designs Technical Information\n\nRobot\n\nThe head of a classic robot. Commonly depicted as a\nvintage, tin toy robot with circular eyes, a triangular\nnose, knobs for ears, a light and/or antennae atop its\n\nLearn More About This Emoji\n\nGoes Great With\n@®e. =\n¢ b&\n\nUpcoming Events\n\n@ Thanksgiving ff Black Friday Emoji List\n\nHanukkah\n\n2. Christmas\n\nLatest News\n\n@9RF MOV OSREA\n\nShow More\n\nShow More\n\nMicrosoft Windows Samsung One UI What's New in\n\n11 23H2 Emoji 6.0 Emoji Unicode 15.1 &\nChangelog Changelog Emoji 15.1\nMicrosoft have begun Samsung has begun The latest list of emoji\nto roll out their latest rolling out the latest recommendations\n\nversion of its Android\nsoftware layer, One UI\n6.0. This update\n\nintroduces a brand new\nvisual style for the va...\n\nupdate to Windows 11,\nadding Emoji 15.0\nsupport and debuting\nthe glossy 3D Fluent\ndesigns in select appl...\n\ndrafted by the Unicode\nConsortium - Emoji\n15.1 - has been\nformally approved. This\nmeans that 118 new\n\nemojis s...\n\nVendors & Platforms Emojipedia Updates & Releases\n\nAbout Emojipedia Latest Approved Emojis\n\nle Noto Color Emoji Contact Emaji Kitchen\n\nLatest Draft Emolis\n\nsung Emoji Wr\n\nEmojipedia Shop All Emoji Version\n\nFacebook Licensing All Unicode Ve\nTwitter / x ings Emoji Prope\nWhatsApp Information Emoji Reau\nJoyPixels Privacy Palioy\n\nSnapchat Terms of Service\n\nTikTok How To Change Language\n\nAll Vendors & Platforms AL Art Master\n\n‘All emoji names are official Unicode: Character Database or CLDR names. Gode points fisted\n\nare part of the Lnicode Standard.\n\nAdditional emoji descriptions and definitions are copyright © Emojipedia. Emoji images\ndisplayed on Emojipedia are copyright © their respective creators, unless otherwise noted.\n\nEmojipedia® is a member of the Unicode Consortium,\n\nZEDGE’\n\n(reer ar\n\nFacts, Figures & Guides\n\na Emojipedia is brought to you by Zedge, the\n\nEmoji Statistics ‘world's #1 phone personalization app\nEmoji Sequence\n\nGoogle Play\n\n© App st\n\nGender Neutral\n\nCan 1 Email?\n\nEmojipedia® is a registered trademark of Zedye, Inc; Apple® is a registered trademark of\nApple Inc; Microsoft® and Windows® are registered trademarks of Microsoft Corporation;\nGoogle® and Android™ are registered trademarks or trademarks of Google Inc in the United\nStates and/or other countries.\n\nFollow Emojipedia on Twitter, Facebook, Instagram, Mastodon, or TikTok. Da Not Sell My\nPersonal Information. Change Consent. Read our Terms of Service here.\n\nRun a retail store? Check out the NRSPlus.com Point of Sale (POS) system, and low-rate\nNRSPay.com credit card processing from our partner, National Retail Solutions (NFS).\n"

In [3]: !pip list | grep tesserocr
/home/starenka/.local/lib/python3.11/site-packages/IPython/core/interactiveshell.py:2559: UserWarning: You executed the system command !pip which may not work as expected. Try the IPython magic %pip instead.
  warnings.warn(
tesserocr         2.5.2

In [4]: print(tesserocr.tesseract_version())
tesseract 5.3.0
 leptonica-1.82.0
  libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.2) : libpng 1.6.40 : libtiff 4.5.1 : zlib 1.2.13 : libwebp 1.3.2 : libopenjp2 2.5.0

//edit

i guess it needs newer tesseract-ocr? might warn about this, if true?

root ~ # apt show tesseract-ocr
Package: tesseract-ocr
Version: 5.3.0-2
Priority: optional
Section: graphics
Source: tesseract
Maintainer: Alexander Pozdnyakov <[email protected]>
Installed-Size: 2,186 kB
Depends: libarchive13 (>= 3.2.1), libc6 (>= 2.34), libcairo2 (>= 1.2.4), libcurl4 (>= 7.16.2), libfontconfig1 (>= 2.12.6), libgcc-s1 (>= 3.0), libglib2.0-0 (>= 2.12.0), libharfbuzz0b (>= 1.2.6), libicu72 (>= 72.1~rc-1~), liblept5 (>= 1.75.3), libpango-1.0-0 (>= 1.44.3), libpangocairo-1.0-0 (>= 1.22.0), libstdc++6 (>= 11), libtesseract5 (= 5.3.0-2), tesseract-ocr-eng (>= 4.0.9~), tesseract-ocr-osd (>= 4.0.9~)
Replaces: tesseract-ocr-data
Homepage: https://github.com/tesseract-ocr/
Tag: accessibility::ocr, implemented-in::c++, interface::commandline,
 role::program
Download-Size: 402 kB
APT-Manual-Installed: yes
APT-Sources: https://ftp.debian.org/debian trixie/main amd64 Packages
Description: Tesseract command line OCR tool
 Tesseract is an open source Optical Character Recognition (OCR)
 Engine. It can be used directly, or (for programmers) using an API to
 extract printed text from images. It supports a wide variety of
 languages. This package includes the command line tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions