Use python's pixivpy to download all the works of a specific user from pixiv at once (including moving)

Overview

It's a hassle to download illustrations one by one from pixiv, isn't it? I made a program that can automatically download the work of a specific user using python. It supports not only images but also manga and moving. In particular, Ugoira downloads each image and creates a gif. You can narrow down by tag and number of views, and decide the upper limit of download.

Referenced site

I searched for software that automatically downloads it, but I can't find it easily. I found there Batch DL of pixiv follower's manga and illustrations Collect an infinite number of illustrations from pixiv [python] Download pixiv's Ugoira Up to pixiv tag search and illustration saving using Python I wanted to get an infinite number of illustrations from any illustrator from pixiv This is the site.

This program was created mainly with reference to Pixiv's follow user's manga / illustration batch DL.

Note

pixivpy is probably a library made by Chinese volunteers and is not official. Therefore, it is recommended to use a method such as throwing away the user id and password at the time of login so that they can be leaked. Abuse is strictly prohibited. There is an implicit rule of scraping that you can only requery once per second. Without this, it would be a dos attack and legally out. Please comment out sleep at your own risk. If you have any problems, please do so at your own risk.

table of contents

Operating environment

This program runs under windows10 environment. I don't know if it works under Linux or Mac environment. Works with the latest version of anaconda as of December 30, 2020

directory

Please prepare like this

.
├── pixiv_downloader.py
├── img
│   ├── 
├── client.json

The image looks like this ディレクトリ.png

Please change to this directory before running python. How to move

Library import

pip install pixivpy

If you cannot install it, open a command prompt as an administrator and try it.

Library used: pixivpy Chinese (original) https://github.com/upbit/pixivpy Japanese https://github.com/tsubasa/pixiv.py

Create client.json

Please write the information you use to log in Copy and paste the contents below with Notepad etc., write your own information, and when saving, change the extension to .json instead of .txt and save it as client.json.

client.json


{
  "pixiv_id": "pixiv_id",
  "password": "password",
}

ログイン.png "pixiv_id" and "password" are the character strings that you usually enter in the image. Please enclose it in "" like " pixiv_id ":" 12345 ".

code

Copy and paste the following into Notepad and save it as pixiv_downloader.py.

pixiv_downloader.py



from pixivpy3 import *
import json
import os
from PIL import Image
import glob
from time import sleep


#This is a setting for each individual to do

#ID of the user who wants to download(The number at the end of the url when you go to the user's page on the web)
id_search = 11
#Maximum number of works to download per person, if you want to download all, make it as large as possible (exceeding the number of works of one author)
#Downloaded in newest order
works=10
#Filter by number of bookmarks, set minimum, 0 for all
score=0
#Filter by number of views, set minimum value, 0 is all
view=0

#Filter by tag How to write target_tag = ["Fate/GrandOrder","FGO","FateGO","Fate/staynight"]
target_tag = [] #target_If you write more than one in the tag, download if there is at least one of them
target_tag2 = []#Further target_If you write in tag2, target_Satisfy tag and target_Download only those that satisfy tag2
extag = ["R-18"]#Do not download if even one tag in extag is included

#Directory to save images
main_saving_direcory_path = "./img/"




#Pre-processing

#Read the file with the account information created in advance
with open("client.json", "r") as f:
    client_info = json.load(f)

#pixivpy login process
api = PixivAPI()
api.login(client_info["pixiv_id"], client_info["password"])
aapi = AppPixivAPI()
aapi.login(client_info["pixiv_id"], client_info["password"])




#About downloading from here

#Get the number of works with the entered id
illustrator_id = api.users_works(id_search, per_page=works)


#To avoid people who haven't posted pictures on their account alone
if not illustrator_id.count == 0:
    total_works = illustrator_id.pagination.total
    if works < total_works:
        total_works=works

    #Get the data of the first work to get the user's information
    illust = illustrator_id.response[0]

    #Since the folder is created with the user name, replace the characters that cannot be used in the windows folder name.
    username = illust.user.name.translate(str.maketrans({'/': '_' , ':': ':', ',': '_', ';': ';' ,'*': '_', '?': '?', '"': "'", '>': ')', '<': '(', '|': '|'}))
    username = username.rstrip(".")
    username = username.lstrip(".")
    username = username.rstrip(" ")
    username = username.rstrip(" ")

    #Create a folder in the form of user name (id)
    
    saving_direcory_path = main_saving_direcory_path + username + ("(") +str(illust.user.id) + (")") + "/"

    #Change the folder name to the latest user name
    saving_direcory_name = saving_direcory_path[:-1]
    present_folder_list = glob.glob(main_saving_direcory_path + "*")

    for present_dir in present_folder_list:
        num = present_dir.rsplit("(", 1)[-1][:-1]
        #print(num)
        name = present_dir.rsplit("\\", 1)[-1]
        name = name.rsplit("(", 1)[0]
        #print(name)
        #print("--------------------------------------")
        if num == str(illust.user.id) and username != name:
            print(present_dir + "Next change" + saving_direcory_name)
            print("--------------------------------------")
            os.rename(present_dir, saving_direcory_name)





    #Create if the folder does not exist
    if not os.path.exists(saving_direcory_path):
        os.mkdir(saving_direcory_path)
    separator = "------------------------------------------------------------"

    #Display information of illustrator and the number of illustrations
    print("Illustrator: {}".format(illust.user.name))
    print("Works number: {}".format(illustrator_id.pagination.total))
    print(separator)




    #Download
    for work_no in range(0, total_works):

        illust = illustrator_id.response[work_no]

        #filter

        #Filter by tag
        if len(list(set(target_tag)&set(illust.tags))) == 0 and target_tag != []:
            continue
        if len(list(set(target_tag2)&set(illust.tags))) == 0 and target_tag2 != []:
            continue
        #Skip if even one ex tag is included
        if len(list(set(extag)&set(illust.tags))) > 0 :
            continue
        #Download only works above score
        if illust.stats.favorited_count.private + illust.stats.favorited_count.public < score :
            continue
        #Download only works above view
        if illust.stats.views_count < view :
            continue



        #if the illustration has already downloaded, skip downloading it
        #You can post to pixiv png,jpg,gif,Ugoira
        #Here, it is judged only if the first page is downloaded, so it is not clear whether the communication is interrupted during the download and the second and subsequent pages are not downloaded.

        if os.path.exists(saving_direcory_path+str(illust.id)+"_p0.png ") or os.path.exists(saving_direcory_path+str(illust.id)+"_p0.jpg ") or os.path.exists(saving_direcory_path+str(illust.id)+'_ugoira') or os.path.exists(saving_direcory_path+str(illust.id)+"_p0.gif"):
            print("Title:"+str(illust.title)+" has already downloaded.")
            print(separator)
            continue

        #Sleep before download
        sleep(1)
        print("Now: {0}/{1}".format(work_no + 1, total_works))
        print("Title: {}".format(illust.title))


        #Ugoira
        if illust.type == "ugoira":
        #Waiting for illustration ID input
            illust_id = illust.id
            ugoira_url = aapi.illust_detail(illust_id).illust.meta_single_page.original_image_url.rsplit('0', 1)
            ugoira = aapi.ugoira_metadata(illust_id)
            ugoira_frames = len(ugoira.ugoira_metadata.frames)
            ugoira_delay = ugoira.ugoira_metadata.frames[0].delay
            dir_name = saving_direcory_path + str(illust_id)+'_ugoira'


            #Create a folder to save the movement
            if not os.path.isdir(dir_name):
                os.mkdir(dir_name)

            #Download all images used in Ugoira
            for frame in range(ugoira_frames):
                #Sleep during download
                sleep(1)
                frame_url = ugoira_url[0] + str(frame) + ugoira_url[1]
                aapi.download(frame_url, path=dir_name)


            #Create a gif based on the saved image
            #The image deteriorates considerably when making a gif due to lack of power
            #I searched for a gif creation library in python, but couldn't find one that could set the degree of compression.
            frames = glob.glob(f'{dir_name}/*')
            frames.sort(key=os.path.getmtime, reverse=False)
            ims = []
            for frame in frames:
                ims.append(Image.open(frame))
            ims[0].save(f'{dir_name}/{illust_id}.gif', save_all=True, append_images=ims[1:], optimize=False, duration=ugoira_delay, loop=0)

        # illustrations with more than one picture
        elif illust.is_manga:
            work_info = api.works(illust.id)
            for page_no in range(0, work_info.response[0].page_count):
                #Sleep during download
                sleep(1)
                page_info = work_info.response[0].metadata.pages[page_no]
                aapi.download(page_info.image_urls.large, saving_direcory_path)

        # illustrations with only one picture
        else:
            aapi.download(illust.image_urls.large, saving_direcory_path)

        print(separator)


    print("Download complete! Thanks to {0}{1}!!".format(illust.user.id, illust.user.name))

how to use

pixiv_downloader.Near the beginning of py


#This is a setting for each individual to do
#ID of the user who wants to download(The number at the end of the url when you go to the user's page on the web)
id_search = 11
#Maximum number of works to download per person, if you want to download all, make it as large as possible (exceeding the number of works of one author)
#Downloaded in newest order
works=10
#Filter by number of bookmarks, set minimum, 0 for all
score=0
#Filter by number of views, set minimum value, 0 is all
view=0
#Filter by tag How to write target_tag = ["Fate/GrandOrder","FGO","FateGO","Fate/staynight"]
target_tag = [] #target_If you write more than one in the tag, download if there is at least one of them
target_tag2 = []#Further target_If you write in tag2, target_Satisfy tag and target_Download only those that satisfy tag2
extag = ["R-18"]#Do not download if even one tag in extag is included

Please set as in each comment In the above state, 10 new works will be downloaded from the official pixiv account, excluding the works tagged with R-18. I will explain in detail. id_search

id_search is the user id of the user you want to download. Go to the user's page and it's the last number in the url. InkedDesktop Screenshot 2020.12.31 - 21.22.03.74-1_LI.jpg https://www.pixiv.net/users/11 If there is like, it is 11. If you execute it after that, a separate folder for each user will be created in the img folder, and the images will be downloaded inside. This time, it will be downloaded in a folder called pixiv Secretariat (11).

works This time, works = 10 is set, but 10 works (manga etc. are collectively counted as 1) will be downloaded in the order of newest. Images that have already been downloaded are also counted. If you do not want to limit the number of downloads, use a huge number such as works = 10000000000.

score score represents the number of bookmarks. In this score = 0, all works will be downloaded. For example, if scre = 100, works less than 100 will not be downloaded.

view view refers to the number of views. The operation is the same as score, and only works with that value or more are downloaded.

tag The behavior of the tag is annoying. When target_tag = ["Fate/GrandOrder "," FGO "," FateGO "," Fate/stay night "] Download if the tag contains Fate/Grand Order __ or __FGO __ or __FateGO __ or __Fate/stay night. further, target_tag = ["Fate/GrandOrder"] target_tag2 = ["FGO"] At the time of, works that include Fate/Grand Order __ and __FGO will be downloaded.

extag will not download if it contains at least one element. If extag = ["R-18", "R-15"] , it will not be downloaded if it contains the tag R-18 __ or __R-15.

About the error

During execution PixivError: requests GET https://public-api.secure.pixiv.net/v1/works/58652797.json error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response')) If it says, it is probably because you downloaded too much in a short time. Please try again later So far the account has never been banned.

Summary

This time, I explained how to use python to filter and download user's work. The reason I started making this was because a painter I liked a few years ago had erased all the works. At that time, I downloaded all of them, but I thought that I wouldn't have enough time manually, so I created it. I've been using it myself for two years, fixing bugs and making gif downloads available. In the early days, if the user changed the name, the entire work might be downloaded again, which was difficult. I thought it was completed recently and released it. Please comment if you have any features or bugs you would like us to add. Thank you very much.

Outlook

Let's explain more about pixivpy ~~ I also have a program to download all the works of the users I follow, so I will write it if I have time ~~ I wrote Download all the works of users who are following from pixiv using python's pixivpy (including Ugoira) (Part 2) If you have time, we have created a method for detecting damaged files and finding methods if all files cannot be downloaded.

pixiv Downloader API I found something like this. I also use pixivpy It seems that there is a function to write exif (image metadata) gif is mp4 ← Was there a hand? Poi using ffmpeg It seems that you can specify the thumbnail of the folder if you prepare an image called folder.jpg https://xtech.nikkei.com/it/pc/article/NPC/20060420/235865/ This program downloads the author's image in this way.

Important edit history

2020/1/4 postscript Programs before 2021/1/5 were wrong and did not work. I'm sorry. I fixed it, so please try again.

Recommended Posts

Use python's pixivpy to download all the works of a specific user from pixiv at once (including moving)
Use python's pixivpy to download all the works of the users you follow from pixiv at once (including moving) (Part 2)
Convert pixiv to mp4 and download from pixiv using python's pixivpy
Find all patterns to extract a specific number from the set
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
The story of moving from Pipenv to Poetry
To extract the data of a specific column in a specific sheet in multiple Excel files at once and put the data in each column in one row
A Python script that goes from Google search to saving the Search results page at once
How to insert a specific process at the start and end of spider with scrapy
Download the top 10 views from one Youtube channel at once
After all, the story of returning from Linux to Windows
Read all the contents of proc / [pid] ~ From setgroups to wchan ~
Read all the contents of proc / [pid] ~ From cwd to loginuid ~
Read all the contents of proc / [pid] ~ From map_files to numa_maps ~
Read all the contents of proc / [pid] ~ From oom_adj to sessionid ~
Read all the contents of proc / [pid] ~ from attr to cpuset ~
Pass the selected item in Tablacus Explorer from JScript to python and rename it all at once
I want to backtest a large number of exchange pairs and strategies at once with Python's backtesting.py
How to use the Slack API using Python to delete messages that have passed a certain period of time for a specific user on a specific channel