[PYTHON] Download Google logo → Convert to text with OCR → Display on HTML

Overview

As shown below, the logo on the top page of Google search is converted to text and displayed on HTML.

スクリーンショット 2020-04-02 午後8.25.59.png

スクリーンショット 2020-04-02 午後8.11.10.png

Application example

You can use this method to compile English books published on the Internet in image format into HTML, and use Chrome's page translation function to translate them into Japanese for reading.

Execution step

  1. Scrap the top page of Google Search to get the URL of the Google logo image. In addition, download the image.
  2. Apply OCR to the logo image to convert it into text.
  3. Display this text in HTML.

Install the library in advance

bash


#For step 1
pip install beautifulsoup4

#For step 2
brew install tesseract
pip install pyocr

#For step 3
pip install jinja2

Run

** Step 1: Download logo image **

python


import requests
from bs4 import BeautifulSoup

#Get html
url = 'https://www.google.com'
res = requests.get(url)
soup = BeautifulSoup(res.text, 'html.parser')

#Extract image
img = soup.find('img', {'id': 'hplogo'})

#Create URL for image
img_url = 'https://www.google.com' + img['src']

#Download image
r = requests.get(img_url)

#Save image
with open('hplogo.jpg' ,'wb') as file:
    file.write(r.content)

** Step 2: Convert logo image to text with OCR **

python


from PIL import Image
import pyocr
import pyocr.builders

#Preset 1
tools = pyocr.get_available_tools()
tool = tools[0]

#Preset 2
builder = pyocr.builders.TextBuilder()

#Load image
img = Image.open('hplogo.jpg')

#Run OCR
result = tool.image_to_string(img, builder=builder)

** Step 3: Display the text in HTML **

python


from jinja2 import Template

#Generate view
html = '''
<!DOCTYPE html>
<html lang="en">
<head>
    <title>The Farther Reaches Of Human Nature</title>
</head>
<body>

    <h1>{{ result }}</h1>

</body>
</html>
'''
template = Template(html)
data = { 'result': result }
view = template.render(data)

#Save
with open('hplogo.html', 'w', encoding='utf-8') as f:
    f.write(view)

When you open the generated hplogo.html in your browser, you should see the text "Google" as follows: (Image reprinted)

スクリーンショット 2020-04-02 午後8.11.10.png

reference

Beautiful Soup in 10 minutes --Qiita Let's scrape images with Python-Qiita How to execute OCR with Python | Gammasoft Co., Ltd. I want to output HTML in Python for the first time in a while, so check the template --Qiita

Recommended Posts

Download Google logo → Convert to text with OCR → Display on HTML
Convert HTML to text file
Convert .ipynb to .html (with BatchFile)
Convert 202003 to 2020-03 with pandas
Convert a text file with hexadecimal values to a binary file
[Ev3dev] How to display bmp image on LCD with python
How to deal with UnicodeDecodeError when executing google image download
Linking Python and Arduino to display IME On / Off with LED
Try to display google map and geospatial information authority map with python
I tried to display GUI on Mac with X Window System
Convert PDF to Documents by OCR
[Rails] How to display Google Map
Display multiple markers on Google Map
Convert wma to mp3 on Mac
Common html to rent with Django
convert ggplot based graph to html
Convert list to DataFrame with python
Convert sentences to vectors with gensim
Convert PDF to image with ImageMagick
Play with Turtle on Google Colab
[Rails] How to calculate latitude and longitude with high accuracy using Geocoding API and display it on Google Map