[PYTHON] About WARNING when packaging with pyinstaller using pyocr

Introduction

Hello, nice to meet you, good night. I'm sorry.

The information available in this article is:

--If you are using pyocr and packaging with pyinstaller, remove the WARNING that appears. --tesseract-How to distribute exe using OCR / pyocr

Reproduction

  1. Install Tesseract-OCR separately (Teacher! I think this is the number one cause!)

  2. Write * .py with the appropriate pyocr

  3. pyinstaller hoge.py

  4. When you start foo.exe spit out with dist, the following log is output ...

Running from container, but no tessdata (C:\Users\{user}\AppData\Local\Temp\_MEI{tempnumber}\data) found !
Running from container, but no tessdata (C:\Users\{user}\AppData\Local\Temp\_MEI{tempnumber}\data) found !

Well, even if this happens, if Tesseract-OCR is installed separately on the target PC, it can be executed, so there are not many people who are fundamentally in trouble ...

However, since it is dirty, I will erase it.

Cause

If you glance inside pyocr, you will find it near Line: 120 in tesseract.py.

This is the cause.

    if getattr(sys, 'frozen', False):  # pragma: no cover
        # Pyinstaller support
        path = os.environ["PATH"]
        if sys._MEIPASS in path:
            # already changed
            return

        tesspath = os.path.join(sys._MEIPASS, "tesseract")
        tessprefix = os.path.join(sys._MEIPASS, "data")
        logger.info("Running in packaged environment")

        if not os.path.exists(os.path.join(tessprefix, "tessdata")):
            logger.warning(
                "Running from container, but no tessdata ({}) found !".format(
                    tessprefix
                )
            )
        else:
            logger.info("TESSDATA_PREFIX set to [{}]".format(tessprefix))
            os.environ['TESSDATA_PREFIX'] = tessprefix
        if not os.path.exists(tesspath):
            logger.warning(
                "Running from container, but no tesseract ({}) found !".format(
                    tesspath
                )
            )
        else:
            logger.info("[{}] added to PATH".format(tesspath))
            os.environ['PATH'] = (
                tesspath + os.pathsep + os.environ['PATH']
            )

It seems that they are doing various things, but the user does not know this (crying)

Apparently it would be nice if sys._MEIPASS had tesseract and data.

Then you can add tesseract and data to sys._MEIPASS! Solution! Enthusiasm! READY to FIGHT!

Add the following to * .spec

          Tree('{Tesseract-OCR path}',prefix='tesseract'),
          Tree('{Tesseract-OCR path}',prefix='data'),

Run!

Running from container, but no tessdata (C:\Users\{user}\AppData\Local\Temp\_MEI{tempnumber}\data) found !
Running from container, but no tessdata (C:\Users\{user}\AppData\Local\Temp\_MEI{tempnumber}\data) found !

It seems that the fighting power is still insufficient.

Method

First from the directory structure

Eventually it became like this.

I have to have more tessdata in the data, and I don't understand the meaning because the contents are duplicated. (Maybe there is a better solution ...)

Addition to * .spec

# -*- mode: python ; coding: utf-8 -*-

block_cipher = None


a = Analysis(['{main}.py'],
             pathex=['{src_dir}'],
             binaries=[],
             datas=[],
             hiddenimports=[],
             hookspath=[],
             runtime_hooks=[],
             excludes=[],
             win_no_prefer_redirects=False,
             win_private_assemblies=False,
             cipher=block_cipher,
             noarchive=False)
pyz = PYZ(a.pure, a.zipped_data,
             cipher=block_cipher)
exe = EXE(pyz,
          Tree('{data}',prefix='data'),
          Tree('{tesseract}',prefix='tesseract'),
          a.scripts,
          a.binaries,
          a.zipfiles,
          a.datas,
          [],
          name='{appname}',
          debug=False,
          bootloader_ignore_signals=False,
          strip=False,
          upx=True,
          upx_exclude=[],
          runtime_tmpdir=None,
          console=True , icon='icon.ico')

There is no change here.

{main}.py {} .py No special description required

pyinstaller! Now that you're ready

$> pyinstaller --onefile --clean --icon={icon}.ico -n {appname} {main}.py

And make {appname} .spec as above

$> pyinstaller {appname}.spec

And distribute {appname} .exe under dist and it's okay!

This is just a Windows environment method, but of course you do not need to install Tesseract-OCR separately on the target PC.

If there is a better way, I would appreciate it if you could teach me in the reply on Twitter or in the comments here!

Then Otonoshi

Recommended Posts

About WARNING when packaging with pyinstaller using pyocr
When using optparse with iPython
When using MeCab with virtualenv python
Precautions when using six with Python 2.5
When I get an error with PyInstaller
Error, warning when using TensorFlow on Mac
When I tried to connect with SSH, I got a warning about free space.
Secret key generation when using EncryptedCookieStorage with aiohttp_session
How to deal with SessionNotCreatedException when using Selenium
Precautions when using google-cloud library with GAE / py
What are you using when testing with Python?
Notes about with
A story about using Resona's software token with 1Password
Precautions when using sqlite3 on macOS Sierra (10.12) with multiprocessing
Troublesome story when using Python3 with VScode on ubuntu
I can't exe a project using PyWebView with PyInstaller
I got an error when using Tensorboard with Pytorch
When using Union type with pydantic, combine it with Literal