Overview

As shown below, the logo on the top page of Google search is converted to text and displayed on HTML.

↓

Application example

You can use this method to compile English books published on the Internet in image format into HTML, and use Chrome's page translation function to translate them into Japanese for reading.

Execution step

Scrap the top page of Google Search to get the URL of the Google logo image. In addition, download the image.
Apply OCR to the logo image to convert it into text.
Display this text in HTML.

Install the library in advance

`bash`


#For step 1
pip install beautifulsoup4

#For step 2
brew install tesseract
pip install pyocr

#For step 3
pip install jinja2

Run

** Step 1: Download logo image **

`python`


import requests
from bs4 import BeautifulSoup

#Get html
url = 'https://www.google.com'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')

#Extract image
img = soup.find('img', {'id': 'hplogo'})

#Create URL for image
img_url = 'https://www.google.com' + img['src']

#Download image
r = requests.get(img_url)

#Save image
with open('hplogo.jpg' ,'wb') as file:
    file.write(r.content)

** Step 2: Convert logo image to text with OCR **

`python`


from PIL import Image
import pyocr
import pyocr.builders

#Preset 1
tools = pyocr.get_available_tools()
tool = tools[0]

#Preset 2
builder = pyocr.builders.TextBuilder()

#Load image
img = Image.open('hplogo.jpg')

#Run OCR
result = tool.image_to_string(img, builder=builder)

** Step 3: Display the text in HTML **

`python`


from jinja2 import Template

#Generate view
html = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <title>The Farther Reaches Of Human Nature</title>
</head>
<body>

    <h1>{{ result }}</h1>

</body>
</html>
'''
template = Template(html)
data = { 'result': result }
view = template.render(data)

#Save
with open('hplogo.html', 'w', encoding='utf-8') as f:
    f.write(view)

When you open the generated hplogo.html in your browser, you should see the text "Google" as follows: (Image reprinted)

reference

Beautiful Soup in 10 minutes --Qiita Let's scrape images with Python-Qiita How to execute OCR with Python | Gammasoft Co., Ltd. I want to output HTML in Python for the first time in a while, so check the template --Qiita

[PYTHON] Download Google logo → Convert to text with OCR → Display on HTML

Overview

Application example

Execution step

Install the library in advance

bash

Run

python

python

python

reference

`bash`

`python`

`python`

`python`