[Python3] Take a screenshot of a web page on the server and crop it further

Introduction

** What I wanted to do ** I want to take a screenshot of a web page on heroku and crop it with an HTML element.

problem When running PhantomJS with selenium, there is no method to get the location of the element position.

solution Execute Javascript with the ʻexecute_script function provided in the selenium.webdriver.PhantomJS` class.

environment

** Python library **

Minimal code

screenshot_crop.py


from PIL import Image
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get("https://www.yahoo.co.jp")  # (1)
driver.save_screenshot("screenshot.png ")  # (2)


element_type = "Id"  # (3)
element_name = "topicsboxbd"  # (4)

before_script = """
                var element = document.getElementBy""" + element_type + "('" + element_name + """');
                var rect = element.getBoundingClientRect(); 
                """  # (5)
left = driver.execute_script(before_script + "return rect.left;")  # (6)
top = driver.execute_script(before_script + "return rect.top;")  # (6)

right = driver.execute_script(before_script + "return rect.width;") + left  # (7)
bottom = driver.execute_script(before_script + "return rect.height;") + top  # (7)

im = Image.open("screenshot.png ")  # (8) 
im = im.crop((left, top, right, bottom))  # (9)
im.save("screenshot_crop.png ")  # (10)
im.close()

Commentary

(1) --Specify the URL to take the screenshot. (2)-Save a screenshot of the entire page. (3) --Specify the attribute name (Id, Class, etc.) of the element in ʻelement_type. Anything is acceptable as long as it is in the Javascript getElementBy 〇〇 circle. So the string to be assigned must start with an uppercase letter. (4) --Specify the attribute value (main part such as id = "main") of the attribute specified in (3) of the element in ʻelement_name. (5) --Common part of the JS code to be executed (6) (7) --The Javascript code is executed by the driver.execute_script function to get the coordinates of the upper left and lower right of the element. (8)-Open the screenshot saved in (1). (9)-Crop the original screenshot using the coordinates obtained in (6) and (7). (10) --Save the cropped screenshot.

Execution result

screenshot.png Screenshot of the entire page screenshot.png

screenshot_crop.png Screenshot of screenshot.png cropped with ʻid = "topicsboxbd" `element screenshot_crop.png

When running on heroku

When I put PhantomJS on heroku and take a screenshot, Japanese is not displayed in the saved image as it is. By creating a .font directory in the root directory and inserting a ttf file (otf) that supports Japanese, Japanese will be displayed.

Using phantomjs on Heroku | Program Memo

I created my own module

exphantom.py


from PIL import Image
from selenium import webdriver


class ScreenShot:
    def __init__(self, file_name_: str = "screenshot.png "):
        """
        :type file_name_: str
        """
        self._filename = file_name_
        self._driver = webdriver.PhantomJS()
        self._driver.set_window_size(1024, 768)
        self._crop_margin = 0

    def screen_shot(self, url_: str) -> bool:
        """
        Take a screenshot of the specified url.
        :return: Success is True, Fail is False
        :param url_: the webpage to save screenshot
        """
        try:
            self._driver.get(url_)
            self._driver.save_screenshot(self._filename)
        except Exception as e:
            print(e)
            return False
        return True

    def screen_shot_crop(self, url_: str, search_element_name: str, search_element_type: str = "Id") -> bool:
        """
        Take a screenshot of the specified class of the specified url destination.
        :return: Success is True, Fail is False
        :param url_: the webpage to save screenshot
        :param search_element_name: search to element name
        :param search_element_type: search to element type
        """
        self.screen_shot(url_)
        before_script = """
                        var element = document.getElementBy""" + search_element_type + "('" + search_element_name + """');
                        var rect = element.getBoundingClientRect(); 
                        """
        try:
            left = self._driver.execute_script(before_script + "return rect.left;") - self._crop_margin
            top = self._driver.execute_script(before_script + "return rect.top;")
            right = self._driver.execute_script(before_script + "return rect.width;") + left + self._crop_margin
            bottom = self._driver.execute_script(before_script + "return rect.height;") + top + self._crop_margin
        except Exception as e:
            print(e)
            return False
        im = Image.open(self._filename)
        im = im.crop((left, top, right, bottom))
        im.save(self._filename)
        im.close()
        return True

    def set_file_name(self, filename_: str):
        self._filename = filename_

    def set_window_size(self, width_: int, height_: int):
        self._driver.set_window_size(width=width_, height=height_)

    def get_window_size(self) -> object:
        return self._driver.get_window_size()

    def set_crop_margin(self, crop_margin_: int):
        self._crop_margin = crop_margin_

    def ger_crop_margin(self) -> object:
        return self._crop_margin

    def __del__(self):
        self._driver.close()


if __name__ == "__main__":
    #Specify the URL to take a screenshot
    screen_url = "https://www.yahoo.co.jp"
    #Specify the attribute of the element to crop
    element_type = "Id"
    #Specify the element name to crop
    element_name = "topicsboxbd"
    #Specify the save destination file name when creating an instance
    ss = ScreenShot("screenshot.png ")
    # screen_Save screenshot of url
    ss.screen_shot(screen_url)
    #Change the save destination file name
    ss.set_file_name("screenshot_crop.png ")
    # screen_url element_element of type attribute_Save a screenshot of the element named name
    ss.screen_shot_crop(screen_url, element_name, element_type)
    #Delete instance
    del ss

Located on GitHub

** Actual use example ** [Unofficial] Miyazaki University Support Division Notice BOT

reference

python selenium phantomJS element.location returns wrong location - Stack Overflow

Recommended Posts

[Python3] Take a screenshot of a web page on the server and crop it further
Get a Python web page, character encode it, and display it
[python, ruby] fetch the contents of a web page with selenium-webdriver
Get a capture of the entire web page in Selenium Python VBA
[Personal memo] Get data on the Web and make it a DataFrame
Launch a web server with Python and Flask
A discussion of the strengths and weaknesses of Python
I want to pass an argument to a python function and execute it from PHP on a web server
Take a screenshot of the LCD with Python-LEGO Mindstorms
Execute the command on the web server and display the result
Install django on python + anaconda and start the server
The result of making a map album of Italy honeymoon in Python and sharing it
I want to take a screenshot of the site on Docker using any font
Take a screenshot in Python
Get an image from a web page and resize it
Test.py is not reflected on the web server in Python3.
Build a Python environment and transfer data to the server
[Introduction to AWS] A memorandum of building a web server on AWS
The process of making Python code object-oriented and improving it
Get the matched string with a regular expression and reuse it when replacing on Python3
How to start a simple WEB server that can execute cgi of php and python
I made a function to crop the image of python openCV, so please use it.
[Python] Save the result of web scraping the Mercari product page on Google Colab to Google Sheets and display the product image as well.
The story of Python and the story of NaN
Automation of a research on geographical information such as store network using Python and Web API
Install mecab on Sakura shared server and call it from python
[PEP8] Take over the Python source code and write it neatly
Specify or create a python folder and then save the screenshot.
Fixed-point observation of specific data on the Web by automatically executing a Web browser on the server (Ubuntu16.04) (2) -Web scraping-
Find the white Christmas rate by prefecture with Python and map it to a map of Japan
[Python] The role of the asterisk in front of the variable. Divide the input value and assign it to a variable
Convert the result of python optparse to dict and utilize it
Get the number of readers of a treatise on Mendeley in Python
[Python / Jupyter] Translate the comment of the program copied to the clipboard and insert it in a new cell
Set up a dummy SMTP server in Python and check the operation of sending from Action Mailer
[Python] I analyzed the diary of a first-year member of society and made a positive / negative judgment on the life of a member of society.
I tried to push the Sphinx document to BitBucket and it will be automatically reflected on the web server
Use AWS lambda to scrape the news and notify LINE of updates on a regular basis [python]
Introduction and usage of Python bottle ・ Try to set up a simple web server with login function
Let's take a look at the Scapy code. Overload of special methods __div__, __getitem__ and so on.
Upload data to s3 of aws with a command and update it, and delete the used data (on the way)
Get the width of the div on the server side with Selenium + PhantomJS + Python
Calculate the shortest route of a graph with Dijkstra's algorithm and Python
Deploy a Python app on Google App Engine and integrate it with GitHub
Hit a method of a class instance with the Python Bottle Web API
Deploy and use the prediction model created in Python on SQL Server
Install the python module with pip on a server without root privileges
Start the webcam to take a still image and save it locally
[python] Send the image captured from the webcam to the server and save it
Summarize the titles of Hottentori at the end and look at the present on the Web
Install Python3 and Django on Amazon Linux (EC2) and run your web server
A memo with Python2.7 and Python3 on CentOS
Connect a lot of Python or and and
Download files on the web with Python
[python] [meta] Is the type of python a type?
Build a web server on your Chromebook
The story of blackjack A processing (python)
[Python] A progress bar on the terminal
Publish the current directory on the web server
[Python] Wouldn't it be the best and highest if you could grasp the characteristics of a company with nlplot?
[Python] I made a web scraping code that automatically acquires the news title and URL of Nikkei Inc.