Überblick

Wir haben zusammengefasst, wie handgeschriebene Zeichen mithilfe der OCR-Funktion (Optical Character Recognition) erkannt werden, die mit GCP (Google Cloud Platform) verwendet werden kann. Für GCP-Anfänger und diejenigen, die ab sofort GCP verwenden möchten.

Einführung

Ziel

Ziel ist es, handgeschriebene Zeichen im Bild mithilfe der OCR-Funktion von GCP zu erkennen.

Ausführungsumgebung

macOS Catalina 10.15.6 Python 3.8.1

Inhaltsverzeichnis

--Bevor du anfängst

Bereiten wir die Eingabedaten vor
Jetzt Umsetzung --Ausführen
Ausführungsergebnis --Code Erklärung --Eindruck

Bevor du anfängst

Um jeden GCP-Dienst nutzen zu können, müssen Sie ein Google-Konto erstellen. Wenn Sie kein Google-Konto haben, können Sie unter hier ein Google-Konto erstellen.

Wechseln Sie nach dem Erstellen eines Google-Kontos zu GCP Console und [hier](https://cloud.google.com/vision/docs/before-you- Weitere Informationen zum Festlegen des Cloud-Projekts und der Authentifizierungsinformationen finden Sie unter Anfang.

Bereiten wir die Eingabedaten vor

Bereiten Sie vor Beginn der Implementierung zunächst das handschriftliche Bild vor, das Sie erkennen möchten. Ich habe ein solches Bild vorbereitet.

Nun zur Implementierung

[Tutorial](https://cloud.google.com/vision/docs/handwriting?apix_params=%7B%22alt%22%3A%22json%22%2C%22%24.xgafv%22%3A%221%22 % 2C% 22prettyPrint% 22% 3Atrue% 2C% 22resource% 22% 3A% 7B% 7D% 7D # Vision-Dokument-Texterkennung-Python) zum Erstellen des Codes. Der erstellte Code lautet wie folgt. Der Dateiname lautet detect.py.

import os
import io

from google.cloud import vision

def detect_document(path):
    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.document_text_detection(image=image)

    for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

                    for symbol in word.symbols:
                        print('\tSymbol: {} (confidence: {})'.format(
                            symbol.text, symbol.confidence))

    if response.error.message:
        raise Exception(
            '{}\nFor more info on error messages, check: '
            'https://cloud.google.com/apis/design/errors'.format(
                response.error.message))


if __name__ == "__main__":
    path = 'sample.png'
    detect_document(os.path.abspath(path))

Lauf

Der Ausführungsbefehl lautet wie folgt.

python3 detect.py

Ausführungsergebnis


Block confidence: 0.8999999761581421

Paragraph confidence: 0.8999999761581421
Word text:ich(confidence: 0.9800000190734863)
	Symbol:ich(confidence: 0.9800000190734863)
Word text:von(confidence: 0.9900000095367432)
	Symbol:von(confidence: 0.9900000095367432)
Word text:Name(confidence: 0.9300000071525574)
	Symbol:Name(confidence: 0.8600000143051147)
	Symbol:Bisherige(confidence: 1.0)
Word text:Ist(confidence: 0.9900000095367432)
	Symbol:Ist(confidence: 0.9900000095367432)
Word text: KOTARO (confidence: 0.8299999833106995)
	Symbol: K (confidence: 0.4099999964237213)
	Symbol: O (confidence: 0.8299999833106995)
	Symbol: T (confidence: 0.8600000143051147)
	Symbol: A (confidence: 0.9900000095367432)
	Symbol: R (confidence: 0.9900000095367432)
	Symbol: O (confidence: 0.949999988079071)
Word text:ist(confidence: 0.9399999976158142)
	Symbol:damit(confidence: 0.9399999976158142)
	Symbol:Su(confidence: 0.949999988079071)
Word text: 。 (confidence: 0.9900000095367432)
	Symbol: 。 (confidence: 0.9900000095367432)

Block confidence: 0.9200000166893005

Paragraph confidence: 0.9200000166893005
Word text:von(confidence: 0.9200000166893005)
	Symbol:von(confidence: 0.9200000166893005)

Block confidence: 0.9300000071525574

Paragraph confidence: 0.9300000071525574
Word text: Python (confidence: 0.9700000286102295)
	Symbol: P (confidence: 0.9800000190734863)
	Symbol: y (confidence: 0.9800000190734863)
	Symbol: t (confidence: 0.9100000262260437)
	Symbol: h (confidence: 0.9900000095367432)
	Symbol: o (confidence: 0.9900000095367432)
	Symbol: n (confidence: 0.9900000095367432)
Word text:Aber(confidence: 0.9700000286102295)
	Symbol:Aber(confidence: 0.9700000286102295)
Word text:Mögen(confidence: 0.8999999761581421)
	Symbol:Gut(confidence: 0.9399999976158142)
	Symbol:Ki(confidence: 0.8600000143051147)
Word text:ist(confidence: 0.8500000238418579)
	Symbol:damit(confidence: 0.7799999713897705)
	Symbol:Su(confidence: 0.9300000071525574)
Word text: 。 (confidence: 0.8799999952316284)
	Symbol: 。 (confidence: 0.8799999952316284)

Block confidence: 0.949999988079071

Paragraph confidence: 0.949999988079071
Word text:Jedermann(confidence: 0.9900000095367432)
	Symbol:Nur(confidence: 0.9900000095367432)
	Symbol:Hmm(confidence: 1.0)
	Symbol:Nana(confidence: 1.0)
Word text: 、 (confidence: 0.699999988079071)
	Symbol: 、 (confidence: 0.699999988079071)
Word text:Folgen(confidence: 0.9300000071525574)
	Symbol:Fu(confidence: 0.8899999856948853)
	Symbol:Oh(confidence: 0.9200000166893005)
	Symbol:B.(confidence: 0.9399999976158142)
	Symbol:- -(confidence: 1.0)
Word text:Shi(confidence: 1.0)
	Symbol:Shi(confidence: 1.0)
Word text:Hand(confidence: 1.0)
	Symbol:Hand(confidence: 1.0)
Word text:Hallo(confidence: 0.9900000095367432)
	Symbol:Hallo(confidence: 0.9900000095367432)
Word text: 。 (confidence: 0.9900000095367432)
	Symbol: 。 (confidence: 0.9900000095367432)
python3 detect.py  0.82s user 0.42s system 2% cpu 57.861 total

Die Größe der Bilddatei betrug 8,7 MB und die Ausführungszeit betrug 0,82 Sekunden. Ich fand, dass es wesentlich genauer war als das Modell, das ich trainierte. Wie erwartet Google in der Welt. .. ..

Codekommentar

Lassen Sie uns einen kurzen Blick auf den Code in der Methode detect_document werfen.

    client = vision.ImageAnnotatorClient()
    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

In diesem Teil werden Authentifizierung und Bildaufnahme durchgeführt. Wenn die Authentifizierungseinstellungen nicht richtig eingestellt sind, tritt in der ersten Zeile ein Fehler auf. Als nächstes folgt der Erkennungsteil.

response = client.document_text_detection(image=image)

Dies ist die einzige Zeile, die tatsächlich erkannt wird. Das Ergebnis der Anwendung des im Bild angegebenen Bildes auf die Erkennung des von Google im Voraus trainierten Modells wird als Antwort zurückgegeben.

for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nBlock confidence: {}\n'.format(block.confidence))

            for paragraph in block.paragraphs:
                print('Paragraph confidence: {}'.format(
                    paragraph.confidence))

                for word in paragraph.words:
                    word_text = ''.join([
                        symbol.text for symbol in word.symbols
                    ])
                    print('Word text: {} (confidence: {})'.format(
                        word_text, word.confidence))

Das Ergebnis wird in diesem Teil angezeigt. Ein Block ist eine Sammlung von Wörtern, und Sie können auf das Vertrauen des gesamten Blocks in block.confidence zugreifen. Block.paragraphs für das, was als Satz (Absatz) in einem Block erkannt wird, block.words für das, was als Wort in einem Absatz erkannt wird, und block.symbols für jedes Zeichen (Symbole) in einem Wort. Sie können darauf zugreifen.

Wenn Sie etwas mit den Erkennungsergebnissen tun möchten, sollten Sie eine Möglichkeit erhalten, von diesem Teil aus auf jedes Erkennungsergebnis zuzugreifen.

Impressionen

Wie erwartet ging es um Genauigkeit und Verarbeitungsgeschwindigkeit. Ich wollte auch verschiedene andere Dinge berühren.

Danke, dass du bis zum Ende zugesehen hast. Ich bin immer noch eine unerfahrene Person. Bitte zögern Sie nicht, mich zu kontaktieren, wenn Sie Vorschläge oder Fragen zum Artikel haben.

[PYTHON] Versuchen Sie es mit der handgeschriebenen Zeichenerkennung (OCR) von GCP.