[PYTHON] Download all the images attached to the body of the pull request on Github

The screenshots are borrowed from DroidKaigi / conference-app-2020.

Why

We are developing apps for our own services. As a preliminary report when the app updater is released, I would like to partially automate the sharing of screenshots of newly created or changed screens with the planning and operation departments. Also, when you put it together in a sentence later, you can make it efficiently because there is a collection of screenshots that have already been collected. The development team I belong to has a habit of putting screenshots and Gif animations in the body of the pull request when requesting a review of the pull request, so I will share that image with the planning and operations department.

Bulk download using Python

Advance preparation

sh


mkdir Downloads
pip install PyGithub BeautifulSoup4 requests

Execution Python source code

main.py


import os
from datetime import datetime, timedelta
#Convert from markdown to HTML
import markdown
#Github API client
from github import Github
#HTML parser
from bs4 import BeautifulSoup
#HTTP client
import requests
#Get extension from MIME type
import mimetypes

#Destination directory
save_dir = "Downloads"
#Create responsible object to convert markdown to HTML
md = markdown.Markdown()
#Make an access token here
# https://github.com/settings/tokens
access_token = os.environ['GITHUB_API_ACCESS_TOKEN']
#Create a Github object with an access token
g = Github(access_token)
#Get repository
gr = g.get_organization('DroidKaigi').get_repo('conference-app-2020')
#For personal repositories
# gr = g.get_user().get_repo('repository_name')
#Range to download(Pull request creation date standard)
start_datetime = datetime(2020, 1, 1)
end_datetime = datetime(2020, 1, 13)
#Get a list of closed pull requests
prs = gr.get_pulls(state='close', base='master')
for pr in prs:
    #The order is descending order of creation date
    if pr.created_at < start_datetime:
        #You don't have to go back any further
        break
    if pr.created_at < end_datetime:
        #Entered the download range
        #Limited to merged PR
        if pr.merged != True:
            continue
        print("#%d %s" % (pr.number, pr.title))
        #Convert markdown in pull request body to HTML
        html = md.convert(pr.body)
        #Parse HTML
        soup = BeautifulSoup(html, features='html.parser')
        #For all img tags
        for img_index, img in enumerate(soup.find_all('img')):
            #Get the value of the src attribute as the image url
            url = img.get('src')
            #Supports cases where the src attribute is empty
            if len(url) == 0:
                continue
            #Download the URL
            r = requests.get(url)
            if r.ok:
                #Successful download
                #Get the mime type of a file
                content_type = r.headers['Content-Type']
                #Get the extension from the mime type
                ext = mimetypes.guess_extension(content_type)
                #I want to include the pull request title in the file name
                #If the slash is included in the file name, the file cannot be created.
                #Convert to full-width
                title = pr.title.replace('/', '/')
                #Create destination path
                path = "%s/%04d_%s_%02d.%s" % (save_dir,
                                               pr.number, title, img_index, ext)
                #Save the image
                with open(path, 'wb') as f:
                    f.write(r.content)
            else:
                #Download failure
                message = "Image download error\n%s\n%d %s" % (
                    url, r.status_code, r.reason)
                raise RuntimeError(message)

Customized according to the situation

For the time being, the above source code is run on a local PC and downloaded to a local folder at once. For example, it would be good to make such customization according to the situation of the organization or project.

--Perform with a continuous delivery workflow --Make it a SlackBot --Upload images to Google Drive

Download images only for version upgrade differences

In the release workflow of Own App, get the PR list merged after the previous version release and [Release Notes](https://github.com/tfandkusu/ I'm running a Python script that automatically creates (quickecho / releases). If you get the text from the PR list and perform the same process as above, you can download the image only for the difference in the version upgrade.

Link to applicable source code

important point

Please make sure that the screenshots you share with other departments do not contain sensitive information such as AWS access keys.

Recommended Posts

Download all the images attached to the body of the pull request on Github
Visualize the timeline of the number of issues on GitHub assigned to you in Python
[Blender] Print all ShapeKey names attached to the object
How to increase the number of machine learning dataset images
Save images on the web to Drive with Python (Colab)
After all, the story of returning from Linux to Windows
Call github api in python to get pull request information
Get images of great find / 47 sites using Python (Part 2/2: I published the target list on github)
The road to download Matplotlib
Let's measure the test coverage of pushed python code on GitHub.
Display the image of the camera connected to the personal computer on the GUI.
Read all the contents of proc / [pid] ~ From setgroups to wchan ~
Disguise the grass on GitHub and try to become an engineer.
Read all the contents of proc / [pid] ~ From oom_adj to sessionid ~
How to use Jupyter on the front end of supercomputer ITO
A command to easily check the speed of the network on the console
How to update the python version of Cloud Shell on GCP
Read all the contents of proc / [pid] ~ from attr to cpuset ~
The story of failing to update "calendar.day_abbr" on the admin screen of django
Download the images and videos included in the tweets you liked on Twitter and upload them to Google Drive
"Cython" tutorial to make Python explosive: When C ++ code depends on the library. First of all, CMake.