[PYTHON] I tried to automatically collect erotic images from Twitter using GCP's Cloud Vision API

Introduction

First post & I've just started learning programming, so there are a lot of poor sentences and codes, but I hope you'll read it.

Motivation

When browsing images on Twitter, I was stressed by many tweets with only text and images other than the target genre. Therefore, I thought that it would be better if I could extract only the desired ones. (Summary: I want erotic images)

Preparation

Get an API key to use the Cloud Vision API This article was helpful

Twitter API applies for usage and obtains an API key and token. It takes a little time and effort because it is necessary to describe the intended use in English. This article was helpful

The following three third-party libraries are used. All can be installed with pip.

  1. schedule
  2. tweepy
  3. requests

Rough processing flow

  1. Get tweets from the Twitter timeline
  2. Save the image from the URL if the tweet contains an image
  3. Send the image to the Cloud Vision API for analysis
  4. Delete all but those judged to be adult

Implementation

main.py


import base64
import json
import os
import pickle
import time

import schedule
import tweepy
import requests

Import the library.

main.py


API_KEY        = 'Twitter API key'
API_SECRET_KEY = 'Twitter API secret key'
ACCESS_TOKEN        = 'Twitter Access token'
ACCESS_TOKEN_SECRET = 'Twitter Access token secret'

CVA_API_KEY = "Cloud Vision API key"

Keep each key you have obtained.

Extract necessary data from Twitter

First, get the TL that is the source of the tweet. This time I use list_timeline because I want to pull the tweets of the account added to the list, but I think that it is also good to narrow down to a specific account by using user_timeline etc.

main.py


auth = tweepy.OAuthHandler(API_KEY, API_SECRET_KEY)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)


api = tweepy.API(auth, wait_on_rate_limit=True)


#Get tweets from the timeline
def main():
    with open('before_tl.pickle', 'rb') as f:
        before_tl = pickle.load(f)
    tl = api.list_timeline(owner_screen_name="List administrator's Twitter ID", slug="The name of the list you want to get")
    with open('before_tl.pickle', 'wb') as f:
        pickle.dump(tl, f)
    for tweet in reversed(tl):    #Reversed to sort tweets and RT times in chronological order
        if not tweet in before_tl:
            media_getter(tweet)

The reason for saving TL with pickle is to avoid over-tapping the pay-as-you-go GCP API. When passing a tweet from TL, it is collated with the previous TL and processing is executed only for new tweets.

main.py


#User's screen name from tweet(ID)Get
def username_geter(tweet):
    if 'RT' in tweet.text:
        return tweet.retweeted_status.user.screen_name
    return tweet.user.screen_name


#Get the URL list of images
def media_getter(tweet):
    try:
        medialist = [d.get('media_url') for d in tweet.extended_entities["media"]]
        name = username_geter(tweet)
        for media in medialist:
            img_save(media,name)
    except:
        print('Text Only')

The user's screen name is used as the file name when saving the image.

This completes the process of getting the image URL from Twitter.

Image storage process

From here, you can save the image and pass it to Cloud Vision for analysis.

main.py


#Save the image from the url and change the save destination according to the judgment
def img_save(media,name):
    url_path = media.split("/")[-1]
    file_name = "adult/" + name + url_path

    response = requests.get(media)
    image = response.content

    with open(file_name, "wb") as f:
        f.write(image)
    identify = img_sort(file_name)

    if identify == "adult":
        print('---saved image---')
    else:
        import os
        os.remove(file_name)

   


#Returns a judgment according to the result
def img_sort(img_path):
    res_json = img_judge(img_path)
    judgement = res_json['responses'][0]['safeSearchAnnotation']['adult']

    if judgement == "POSSIBLE":
        print(judgement)
        return "possible"
    elif judgement == "LIKELY" or judgement == "VERY_LIKELY":
        print(judgement)
        return "adult"
    else:
        print(judgement)


#Send the image to cloudvisoinapi and receive the result
def img_judge(image_path):
    api_url = 'https://vision.googleapis.com/v1/images:annotate?key={}'.format(CVA_API_KEY)
    with open(image_path, "rb") as img:
        image_content = base64.b64encode(img.read())
        req_body = json.dumps({
            'requests': [{
                'image': {
                    'content': image_content.decode('utf-8')
                },
                'features': [{
                    'type': 'SAFE_SEARCH_DETECTION'
                }]
            }]
        })
        res = requests.post(api_url, data=req_body)
        return res.json()

The save destination of the image is determined by dividing the URL with "/" and listing it, and combining the extracted directory at the end with the screen name and directory.

The saved image is passed to the API, and the process is branched based on the returned result. Click here to see what value is returned (https://cloud.google.com/vision/docs/reference/rpc/google.cloud.vision.v1?hl=ja#google.cloud.vision. v1.SafeSearchAnnotation).

It is a specification to save LIKELY (highly likely) and above and delete the others, but this time I changed the save destination according to the judgment to check the accuracy of Cloud Vision.

main.py


import shutil

elif identify == "possible":
    new_file_name = "possible/" + name + url_path
    shutil.move(file_name, new_file_name)
    print('---saved possibleimage---')
else:
    new_file_name = "other/" + name + url_path
    shutil.move(file_name, new_file_name)
    print('---saved otherimage---')

Run

Let's do it. Processing is performed every 8 seconds using schedule.

main.py


if __name__ == "__main__":
    schedule.every(8).seconds.do(main)
    while True:
        schedule.run_pending()
        time.sleep(1)

Execution result

(Because the image of a third party is used, the image is blurred) CEDAnrzRRw406711591791780_1591792292.png

I was able to safely extract and save only the erotic images. It is a masterpiece that images are added more and more. IFUDYjuALJPrNEI1591792337_1591792339.png It was confirmed that the accuracy was quite high when compared with the one judged as POSSIBLE. Everything that you can see is judged to be LIKELY or higher.

Afterword

This time I used SAFE_SEARCH_DETECTION (the ability to determine if an image contains harmful content), but there are many other features in the Cloud Vision API. If you make good use of the function, you can use it for various image collection and classification.

References

Try Google Cloud Vision API TEXT_DETECTION in Python I tried using Google Cloud Vision API How to use Tweety ~ Part 1 ~ [Getting Tweet]

Recommended Posts

I tried to automatically collect erotic images from Twitter using GCP's Cloud Vision API
I tried using the Google Cloud Vision API
I tried to extract characters from subtitles (OpenCV: Google Cloud Vision API)
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried using Twitter api and Line api
I tried using UnityCloudBuild API from Python
How to use GCP's Cloud Vision API
Detect Japanese characters from images using Google's Cloud Vision API in Python
[Rails] How to detect radical images by analyzing images using Cloud Vision API
I want to collect a lot of images, so I tried using "google image download"
I tried to operate from Postman using Cisco Guest Shell as an API server
I tried to create API list.csv in Python from swagger.yaml
I tried to search videos using Youtube Data API (beginner)
Try to determine food photos using Google Cloud Vision API
I tried the Google Cloud Vision API for the first time
[Go language] Collect and save Vtuber images using Twitter API
I tried to get various information from the codeforces API
I tried to get data from AS / 400 quickly using pypyodbc
I tried to create Quip API
I tried to analyze my favorite singer (SHISHAMO) using Spotify API
I tried to touch Tesla's API
[Python] I tried to get various information using YouTube Data API!
I tried to log in to twitter automatically with selenium (RPA, scraping)
I tried using the checkio API
I tried to get data from AS / 400 quickly using pypyodbc Preparation 1
I tried to refactor the template code posted in "Getting images from Flickr API with Python" (Part 2)
I tried to deliver mail from Node.js and Python using the mail delivery service (SendGrid) of IBM Cloud!
I tried to execute SQL from the local environment using Looker SDK
I tried to summarize various sentences using the automatic summarization API "summpy"
When introducing the Google Cloud Vision API to rails, I followed the documentation.
I tried to make PyTorch model API in Azure environment using TorchServe
I tried using Azure Speech to Text.
I tried using YOUTUBE Data API V3
I tried to classify text using TensorFlow
Use Google Cloud Vision API from Python
Transcription of images with GCP's Vision API
I tried to touch the COTOHA API
I tried to make a Web API
I tried using Headless Chrome from Selenium
I tried using the BigQuery Storage API
I tried to predict Covid-19 using Darts
Automatically save images and videos hit by Twitter search to iPhone using Pythonista3
I tried to automate internal operations with Docker, Python and Twitter API + bonus
I tried to automatically generate a port management table from Config of L2SW
[IBM Cloud] I tried to access the Db2 on Cloud table from Cloud Funtions (python)
Python beginners tried it in 3 days from OS installation to running Twitter API
I tried to execute Python code from .Net using Pythonnet (Hallo World edition)
I tried using Remote API on GAE / J
I tried hitting the Qiita API from go
I tried "License OCR" with Google Vision API
I want to email from Gmail using Python.
I tried to touch the API of ebay
How to call Cloud API from GCP Cloud Functions
[I want to classify images using Tensorflow] (2) Let's classify images
I tried "Receipt OCR" with Google Vision API
I tried to make a ○ ✕ game using TensorFlow
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I tried to score the syntax that was too humorous and humorous using the COTOHA API.
I want to detect images of cats from Instagram
I tried to read and save automatically with VOICEROID2 2
I tried to detect the iris from the camera image